TMPRSS2 Lab Notebook

From OpenWetWare
Jump to navigationJump to search

Purpose

TMPRSS2 primes the SARS-CoV-2 spike protein by cleavage of S1/S2 sites which is necessary for virus-cell membrane fusion and cell entry. The purpose of this study is to examine the role of TMPRSS2 polymorphisms and potential impacts they may contribute to SARS-CoV-2.

Combined Method and Results

  • Performed a literature search of possible TMPRSS2 SNPs and compiled data into a table
  • Examine databases for information and data on relevant SNPs

Regulation of TMPRSS2

  • TMPRSS2 is mainly expressed in the luminal cells of the prostate epithelium, positively regulated by androgens
  • Enhancer located 13 kb upstream of the transcription start site crucial for androgen regulation (Clinckemalie et. al 2013)
  • Chromatin looping of E1 and E2 enhancers with PRCAT38 (prostate cancer transcript) and TMPRSS2 promoters (Chen et.al)
  • L formation and enhancer activity were mediated by AR/FOXA1 binding and the activity of acetyltransferase p300 (Chen et.al)
  • TMPRSS2 has no overt phenotype and mice KO of TMPRSS2 shows no apparent problems, indicate that there might be some functional redundancy with other type II transmembrane serine protease family (TTSP) members. (Thunders and Delahunt 2020)

Exon SNPs

  • The TMPRSS2 gene has 14 exons. Each exon will be classifying by viewing the possible snps using NCBI dbSNP database as well as the NCBI Genome Data Viewer. rs12329760 was used as a search to obtain SNP results. TMPRSS2 was searched into the NCBI Gene, and was viewed with the Genome Viewer.
  • To see missense and frameshift variations, go to Tracks then click Configure tracks then search for variation. Then select the variations needed. We chose "missense" and "frameshift" then click configure to add the tracks to the gene.

Exon 1

AD Exon 1.png

Exon 2

ADExon2.png

  • There are 8 missense variations in exon 2.
rs1201213753, rs1331618004, rs1248965862, rs375223866, rs373389042, rs1334212309, 
rs1365419816, rs1316902409

Exon 3

ADExon3.png

  • There are 5 frameshift variations in Exon 3.
rs781272749, rs753733271, rs754945404, rs747959268, rs758190152

Exon 4

ADExon4.png

Exon 5

Exon 5 EWalton.png

Exon 6

Exon 6 EWalton.png

Exon 7

Exon 7 EWalton.png

Exon 8

Exon8 DaMota.png

Exon 9

Exon9a DaMota.png (a)

Exon9b DaMota.png (b)

Exon9c DaMota.png (c)

Exon 10

Exon10 DaMota.png

Exon 11

Exon11 DaMota.png

  • Exon 11 as pictured through the Genome Browser of NCBI. SNP variants include 2 frameshift variants and many missense variations. Neither frameshift variant (rs771443342,rs1274705894) has significant differences among different populations according to the Ensembl database.
  • Viewed by clicking “Exon 11” in blue Exon Navigator bar above the sequence viewer
  • Screenshot captured on 10/26/20 from https://www.ncbi.nlm.nih.gov/genome/gdv/browser/gene/?id=7113

Exon 12

Exon 12 TMPRSS2 Madeleine King.PNG

Exon 13

Exon 13 TMPRSS2 Madeleine King updated.PNG

Exon 14

Exon 14 TMPRSS2 Madeleine King.PNG

  • Exon 14 was viewed using Genome Data browser. One snp is observed: rs456298. This snp was observed to be significantly difference in frequency between Asian and other populations. This snp is located in the miRNA targeting site.
  • Viewed by clicking “Exon 14” in blue Exon Navigator bar above the sequence viewer
  • Screenshot captured on 10/26/20 from https://www.ncbi.nlm.nih.gov/genome/gdv/browser/gene/?id=7113

Gene Map

  • We created a gene map using Google Slides. TMPRSS2 Gene Diagram
    • Text boxes and arrows were used to display regulatory regions and exons.
  • Information about exons and regulatory regions were taken from the NCBI gene viewer.
  • Isoforms were added to the map.

ADTMPRSS2GeneMap.png

Isoforms

  • TMPRSS2 was searched on Uniprot (UniProtKB - O15393)
    • Two isoforms are shown under sequences
    • Sequences were compared by clicking the Align button (Alignment)
    • Isoform 1 is shown to be lacking the first 37 A.A.
  • TMPRSS2 was also searched on NCBI Gene (Gene ID: 7113)
    • Navigated to “Genomic regions, transcripts, and products” section and clicked “Switch ON mode ‘show All’ for Gene tracks” in the toolbar
    • 3 isoforms were seen

TMPRSS2 Summary

  • Full Name: Transmembrane Serine Protease 2
    • Chromosome Region: 21q22.3 Location: 41,464,305-41,508,158
    • Exons: 14
    • Promoter Location: 41,507,255-41,508,648
    • Regulatory Regions: 41,508,250-41,508,276
    • Enhancers:41,508,128-41,509,135
    • Terminator: location unknown
  • Has 3 isoforms as a result of alternative splicing
  1. Isoform 1 is believed to be the most relevant due to its expression in viral target cells.
    1. DUF3824: Domain of unknown function (44-91 bp)
    2. LDLa: “Low Density Lipoprotein Receptor Class A” domain: a cysteine-rich repeat that plays a central role in mammalian cholesterol metabolism (150-185 bp)
    3. SRCR_2: "Scavanger Receptor Cysteine Rich Domain"(190-283 bp)
    4. Tryp_SPc: “Trypsin-like serine protease”: active site found within this region (293-524 bp)
  2. Isoform 2 is a 492 amino acid protein that is not yet characterized.
  3. Isoform 3 is a 498 amino acid protein that is not yet characterized.

ClinVar SNPs

Previous Research on TMPRSS2 Structure

Structure

ADTMPRSS2Structure.png

Figure 1. Molecular structure of TMPRSS2. (A) Scaled schematic representation of the functional domains of TMPRSS2 protein (B) XtalPred analysis where blue line represents the least probability of the crystallization (C) Ramachandran plot of the model (D) ProSA analysis of the model where the Z score of the model is indicated by black dot, whereas Z scores of resolved structures are shown with dark blue (NMR) and light blue (X-ray) shades. Superimposition of the full length TMPRSS2 model with template in (E) ribbon (F) Cα backbone conformations. Ribbon conformation (G) and surface topology (H) of TMPRSS2 structure where domains are coloured differently and labelled at the corresponding positions. (I) Functionally important residues are shown in green, blue, red and yellow sticks representing calcium binding sites, substrate binding sites, catalytic sites and proteolytic cleavage site, respectively. (J) Molecular dynamic simulation of TMPRSS2 model showing reasonable stability of the molecule after 10000 picoseconds of the simulation run.

ADTMPRSS2andSpike.png

Figure 2. TMPRSS2 and SARS-CoV-2 spike protein Molecular Complex. Ribbon diagram of complexes between TMPRSS2 (magenta) and SARS-CoV-2 spike protein (gold) for (A) site1 (Arg685/Ser686) and (B) site2 (Arg815/Ser816), residues of TMPRSS2 (magenta sticks) and spike protein (gold sticks) involved in the intermolecular interactions are shown in the respective boxes. PDB files of the complexes are made available in supplementary materials Model and Complexes.

I-TASSER

  • I-TASSER is a protein modeling resource created by ZhangLab. Uses threading to predict protein secondary and 3D models
    • submit protein sequence in FASTA format
    • program predicts secondary structure
    • predicts if hydrophobic or hydrophilic residues
    • Predicts 3D models
    • takes about 20 to 60 hours to complete, need to register with the ZhangLab.

Swiss-Model

  • Protein structure homology-modelling
    • Has TMPRSS2 modeled already for isoform 1 ID:O15393

Swiss-Model Run for TMPRSS2 & rs12329760

Input FASTA data for TMPRSS2 with and without rs12339760 mutation (V160 --> M)

  • FASTA format for TMPRSS2 was obtained from Uniprot:O15393
>sp|O15393-2|TMPS2_HUMAN Isoform 2 of Transmembrane protease serine 2 OS=Homo sapiens OX=9606 GN=TMPRSS2
MPPAPPGGESGCEERGAAGHIEHSRYLSLLDAVDNSKMALNSGSPPAIGPYYENHGYQPE
NPYPAQPTVVPTVYEVHPAQYYPSPVPQYAPRVLTQASNPVVCTQPKSPSGTVCTSKTKK
ALCITLTLGTFLVGAALAAGLLWKFMGSKCSNSGIECDSSGTCINPSNWCDGVSHCPGGE
DENRCVRLYGPNFILQVYSSQRKSWHPVCQDDWNENYGRAACRDMGYKNNFYSSQGIVDD
SGSTSFMKLNTSAGNVDIYKKLYHSDACSSKAVVSLRCIACGVNLNSSRQSRIVGGESAL
PGAWPWQVSLHVQNVHVCGGSIITPEWIVTAAHCVEKPLNNPWHWTAFAGILRQSFMFYG
AGYQVEKVISHPNYDSKTKNNDIALMKLQKPLTFNDLVKPVCLPNPGMMLQPEQLCWISG
WGATEEKGKTSEVLNAAKVLLIETQRCNSRYVYDNLITPAMICAGFLQGNVDSCQGDSGG
PLVTSKNNIWWLIGDTSWGSGCAKAYRPGVYGNVMVFTDWIYRQMRADG
  • rs12329760 mutation was made on the A.A. sequence (V160 --> M)
>sp|O15393-2|TMPS2_HUMAN Isoform 2 of Transmembrane protease serine 2 OS=Homo sapiens OX=9606 GN=TMPRSS2
MPPAPPGGESGCEERGAAGHIEHSRYLSLLDAVDNSKMALNSGSPPAIGPYYENHGYQPE
NPYPAQPTVVPTVYEVHPAQYYPSPVPQYAPRVLTQASNPVVCTQPKSPSGTVCTSKTKK
ALCITLTLGTFLVGAALAAGLLWKFMGSKCSNSGIECDSSGTCINPSNWCDGVSHCPGGE
DENRCVRLYGPNFILQMYSSQRKSWHPVCQDDWNENYGRAACRDMGYKNNFYSSQGIVDD
SGSTSFMKLNTSAGNVDIYKKLYHSDACSSKAVVSLRCIACGVNLNSSRQSRIVGGESAL
PGAWPWQVSLHVQNVHVCGGSIITPEWIVTAAHCVEKPLNNPWHWTAFAGILRQSFMFYG
AGYQVEKVISHPNYDSKTKNNDIALMKLQKPLTFNDLVKPVCLPNPGMMLQPEQLCWISG
WGATEEKGKTSEVLNAAKVLLIETQRCNSRYVYDNLITPAMICAGFLQGNVDSCQGDSGG
PLVTSKNNIWWLIGDTSWGSGCAKAYRPGVYGNVMVFTDWIYRQMRADG

HADDOCK 2.4

  • Software from BonvinLab - models interaction between two molecular structures and how they fit together.
  • Must register to the website, submit a new job, fill out required field (job name, type of structures), click Next and generate results.
  • ClusPro is another website to look at protein-protein docking, needs PDB and chains
    • If interested in looking at bonding affinity within specific interactions, BonvinLab has the software Prodigy that can produce this information.

RaptorX Structure Prediction

  • RaptorX is a protein structure prediction server developed by the Xu group. When a sequence is input, RaptorX can predict secondary and tertiary protein structures, contacts, solvent accessibility, disordered regions and binding sites.
  • To submit a job to RaptorX users should
    • Register with email for quick retrieval of results
    • Input a protein sequence or upload a FASTA file. Wait time is 2-3 days.
    • Retrieve results with job ID, email, or sequence
    • Results will include a predicted contact map, a contact result file, and five predicted 3D models assisted by the predicted contacts

RaptorX prediction of TMPRSS2 (Isoform 2) and rs12329760

  • FASTA format for TMPRSS2 was obtained from Uniprot: O15393
>sp|O15393-2|TMPS2_HUMAN Isoform 2 of Transmembrane protease serine 2 OS=Homo sapiens OX=9606 GN=TMPRSS2
MPPAPPGGESGCEERGAAGHIEHSRYLSLLDAVDNSKMALNSGSPPAIGPYYENHGYQPE
NPYPAQPTVVPTVYEVHPAQYYPSPVPQYAPRVLTQASNPVVCTQPKSPSGTVCTSKTKK
ALCITLTLGTFLVGAALAAGLLWKFMGSKCSNSGIECDSSGTCINPSNWCDGVSHCPGGE
DENRCVRLYGPNFILQVYSSQRKSWHPVCQDDWNENYGRAACRDMGYKNNFYSSQGIVDD
SGSTSFMKLNTSAGNVDIYKKLYHSDACSSKAVVSLRCIACGVNLNSSRQSRIVGGESAL
PGAWPWQVSLHVQNVHVCGGSIITPEWIVTAAHCVEKPLNNPWHWTAFAGILRQSFMFYG
AGYQVEKVISHPNYDSKTKNNDIALMKLQKPLTFNDLVKPVCLPNPGMMLQPEQLCWISG
WGATEEKGKTSEVLNAAKVLLIETQRCNSRYVYDNLITPAMICAGFLQGNVDSCQGDSGG
PLVTSKNNIWWLIGDTSWGSGCAKAYRPGVYGNVMVFTDWIYRQMRADG
  • rs12329760 mutation was made on the A.A. sequence (V160 --> M)
>sp|O15393-2|TMPS2_HUMAN Isoform 2 of Transmembrane protease serine 2 OS=Homo sapiens OX=9606 GN=TMPRSS2
MPPAPPGGESGCEERGAAGHIEHSRYLSLLDAVDNSKMALNSGSPPAIGPYYENHGYQPE
NPYPAQPTVVPTVYEVHPAQYYPSPVPQYAPRVLTQASNPVVCTQPKSPSGTVCTSKTKK
ALCITLTLGTFLVGAALAAGLLWKFMGSKCSNSGIECDSSGTCINPSNWCDGVSHCPGGE
DENRCVRLYGPNFILQMYSSQRKSWHPVCQDDWNENYGRAACRDMGYKNNFYSSQGIVDD
SGSTSFMKLNTSAGNVDIYKKLYHSDACSSKAVVSLRCIACGVNLNSSRQSRIVGGESAL
PGAWPWQVSLHVQNVHVCGGSIITPEWIVTAAHCVEKPLNNPWHWTAFAGILRQSFMFYG
AGYQVEKVISHPNYDSKTKNNDIALMKLQKPLTFNDLVKPVCLPNPGMMLQPEQLCWISG
WGATEEKGKTSEVLNAAKVLLIETQRCNSRYVYDNLITPAMICAGFLQGNVDSCQGDSGG
PLVTSKNNIWWLIGDTSWGSGCAKAYRPGVYGNVMVFTDWIYRQMRADG
  • FASTA sequences were separately input into RaptorX
  • Predicted contact and 3D models were generated

TMPRSS2 via RaptorX.png

Rs12329760 via RaptorX.png

HHpred

  • HHpred can be utilized for detecting remote protein homology and structure prediction, including secondary and tertiary structure.
  • Involves information from databases including PDB, SCOP, Pfam, SMART, COGs, and CDD.
  • To visualize a protein structure, users can:
    • Input the protein sequence in A3M/FASTA/CLUSTAL/STOCKHOLM format into the Input field and select submit.
    • Using the results generated, select the templates you would like to visualize and then select Create Model Using Selection.
    • This will generate a PIR file, which can be pasted into the MODDELLER software under 3ary structure.
      • To download and run the MODDELLER software, users need to register for a license key
    • Inputing this license key into Custom Job ID and clicking Submit will generate results.

HHpred for TMPRSS2 in the absence and presence of rs12329760

  • FASTA format for TMPRSS2 was obtained from Uniprot:O15393
>sp|O15393-2|TMPS2_HUMAN Isoform 2 of Transmembrane protease serine 2 OS=Homo sapiens OX=9606 GN=TMPRSS2
MPPAPPGGESGCEERGAAGHIEHSRYLSLLDAVDNSKMALNSGSPPAIGPYYENHGYQPE
NPYPAQPTVVPTVYEVHPAQYYPSPVPQYAPRVLTQASNPVVCTQPKSPSGTVCTSKTKK
ALCITLTLGTFLVGAALAAGLLWKFMGSKCSNSGIECDSSGTCINPSNWCDGVSHCPGGE
DENRCVRLYGPNFILQVYSSQRKSWHPVCQDDWNENYGRAACRDMGYKNNFYSSQGIVDD
SGSTSFMKLNTSAGNVDIYKKLYHSDACSSKAVVSLRCIACGVNLNSSRQSRIVGGESAL
PGAWPWQVSLHVQNVHVCGGSIITPEWIVTAAHCVEKPLNNPWHWTAFAGILRQSFMFYG
AGYQVEKVISHPNYDSKTKNNDIALMKLQKPLTFNDLVKPVCLPNPGMMLQPEQLCWISG
WGATEEKGKTSEVLNAAKVLLIETQRCNSRYVYDNLITPAMICAGFLQGNVDSCQGDSGG
PLVTSKNNIWWLIGDTSWGSGCAKAYRPGVYGNVMVFTDWIYRQMRADG
  • rs12329760 mutation was made on the A.A. sequence (V160 --> M)
>sp|O15393-2|TMPS2_HUMAN Isoform 2 of Transmembrane protease serine 2 OS=Homo sapiens OX=9606 GN=TMPRSS2
MPPAPPGGESGCEERGAAGHIEHSRYLSLLDAVDNSKMALNSGSPPAIGPYYENHGYQPE
NPYPAQPTVVPTVYEVHPAQYYPSPVPQYAPRVLTQASNPVVCTQPKSPSGTVCTSKTKK
ALCITLTLGTFLVGAALAAGLLWKFMGSKCSNSGIECDSSGTCINPSNWCDGVSHCPGGE
DENRCVRLYGPNFILQMYSSQRKSWHPVCQDDWNENYGRAACRDMGYKNNFYSSQGIVDD
SGSTSFMKLNTSAGNVDIYKKLYHSDACSSKAVVSLRCIACGVNLNSSRQSRIVGGESAL
PGAWPWQVSLHVQNVHVCGGSIITPEWIVTAAHCVEKPLNNPWHWTAFAGILRQSFMFYG
AGYQVEKVISHPNYDSKTKNNDIALMKLQKPLTFNDLVKPVCLPNPGMMLQPEQLCWISG
WGATEEKGKTSEVLNAAKVLLIETQRCNSRYVYDNLITPAMICAGFLQGNVDSCQGDSGG
PLVTSKNNIWWLIGDTSWGSGCAKAYRPGVYGNVMVFTDWIYRQMRADG
  • FASTA sequences were imputed into HHpred in separate runs and submitted.
  • The protein templates generated from these runs were visualized using MODDELLER.

TMPRSS2 HHpred Protein.png

TMPRSS2 rs12329760 HHpred.png

  • HHpred shows no effect on protein structure on rs12329760. Both proteins did not have A.A. 160 included in their structure

PredMP

  • PredMP is a de novo prediction and visualization of membrane proteins
    • TMPRSS2 is a transmembrane serine protease
  • FASTA sequence for TMPRSS2 was imputted into the server

TMPRSS2denovomodel1mk.PNG

  • First de novo prediction model of TMPRSS2

SNPs of interest

  • Possible SNPs of interest include ones listed in ClinVar
    • Synonymous: rs142750000, rs2298658, rs141788162, rs61735789, rs199824558
    • Missense: rs61735793, rs201679623, rs61735790
    • Frameshift: rs193920966
  • SNP rs12329760 which has been seen in multiple papers that is a V --> M mutation
  • SNP rs2070788 and rs383510 has been cited in multiple papers as conferring higher risk for severe influenza A virus infection (G-->A)
  • SNP rs456142 and rs462574 are 3' UTR variants that are located in mRNA target sites.
  • SNP rs456298 is a 3' UTR variant that has a different frequency for Asian populations and may be correlated with immune response to rubella vaccine.
  • SNP rs75603675 has been cited in multiple papers that might be associated with SARS-CoV-2 entry
  • SNP rs139010197 is a missense variant and has been cited to may increase infection risk [1]
  • SNP rs977728 is an initiator codon missense variant and may increase infection risk [2]
  • SNP rs353163 is a missense variant that has been cited by multiple papers to increase risk to esophageal cancer.
  • Russo et. al stated that the SNP rs1475908, whose alternative allele (A) is associated with low TMPRSS2 expression, and the two variants rs74659079 (allele T) and rs2838057 (allele A), both associated with high TMPRSS2 expression. Interestingly, the eQTL rs1475908 shows the highest AF among EAS (A:0.38) and EUR (A:0.35) and the lowest frequency among Latinos (0.17) These findings agree with a previous study that demonstrated the association of two high TMPRSS2 expression-variants, rs2070788 (allele G) and rs383510 (allele T), with increased susceptibility to the influenza virus infection A (H7N9)

SNP Population Frequencies

Allele frequencies for each SNP from above was found by searching the SNP on dbSNP and looking under “Frequency”.

  • For consistency across SNPs, Allele Frequency Aggregator ALFA was used for the collection of data.

Frameshift

Intron Variants

Intron SNP Population Frequency
Population European African Asian Latin American 2 Total
rs2070788 G=0.461512 A=0.538488 G=0.3100 A=0.6900 G=0.325 A=0.675 G=0.5271 A=0.4729 G=0.459889 A=0.540111
rs383510 T=0.4817 C=0.5183 T=0.33 C=0.67 T=0.5 C=0.5 T=0 C=0 T=0.4771 C=0.5229

Missense

Missense SNP Population Frequency
Population European African Asian Latin American 2 Total
rs61735793 G=0.9901 A=0.0099 G=0.999 A=.001 G=1.000 A=0.000 G=1.00 A=0.00 G=0.99025 A=0.00975
rs201679623 A=0.9999 C=0.00001 A=1.000 C=0.000 A=1.000 C=0.000 A=1.000 C=0.000 A=1.000 C=0.000
rs61735790 T=0.99996 C=0.00004 T=0.993 C=0.007 T=1.000 C=0.000 T=1.00 C=0.00 T=0.99989 C=0.00011
rs12329760 C=0.777704 T=0.2222 C=0.7093 t=0.2907 C=0.620 T=0.380 C=0.8536 T=0.1464 C=0.777066 T=0.2229
rs75603675 C=0.6017 A=0.3983 C=0.68 A=0.32 C=1.0 A=0.0 C=0 A=0 C=0.6059 A=0.3941
rs139010197 T=0.97531 C=0.02469 T=0.982 C=0.018 T=1.000 C=0.000 T=1.00 C=0.00 T=0.97552 C=0.02448
rs977728 C=0.823458 T=0.176542 C=0.8676 T=0.1324 C=0.824 T=0.176 C=0.65 T=0.35 C=0.823073 T=0.176927
rs353163 T=0.33089 C=0.66911 T=0.1788 C=0.8212 T=0.173 C=0.827 T=0.4638 C=0.5362 T=0.329175 C=0.670825

3' UTR Variant

3' UTR Variant SNP Population Frequency
Population European African Asian Latin American 2 Total
rs456142 T=0.1481 C=0.8300 T=0.406 C=0.594 T=0.67 C=0.33 T=0.33 C=0.67 T=0.1700 C=0.8300
rs462574 A=0.02352 G=0.97648 A=0.1818 G=0.8182 A=0.516 G=0.484 A=0.2661 G=0.7339 A=0.05383 G=0.94617
rs456298 T=0.1515 A=0.8485 T=0.43 A=0.57 T=0.8 A=0.2 T=0 A=0 T=0.1654 A=0.8346

Synonymous

Synonymous SNP Population Frequency
Population European African Asian Latin American 2 Total
rs142750000 C=0.99555 T=0.00445 C=0.999 T=0.001 C=1.000 T=0.000 C=1.00 T=0.00 C=0.99574 T=0.00426
rs2298658 C=1.0000 T=0.0000 C=1.00 T=1.00 C=1.0 T=0.0 C=0 T=0 C=1.000 T=0.000
rs141788162 G=0.9951 A=0.0049 G=0.988 A=0.012 G=0.98 A=0.02 no data G=0.99436 A=0.00564
rs61735789 G=0.98298 A=0.01702 G=0.987 A=0.013 G=1.00 A=0.00 no data G=0.98327 A=0.01673
rs199824558 G=0.9980 A=0.0020 G=0.999 A=0.001 G=1.00 A=0.00 no data G=0.99821 A=0.00179

Data and Files

Link to SNP Table
TMPRSS2 Structure
TMPRSS2 and SARS-Cov-2 Interactions
TMPRSS2 Gene Map
Link to Fall 2020 Research Summary
Link to Abstract