BIOL478/S20:SARS-CoV-2 Spike Protein Structure Exercise
From OpenWetWareJump to navigationJump to search
This assignment is due at the beginning of class on Monday, April 13. E-mail a Word document/PDF to Dr. Dahlquist.
This exercise is based on:
- Wan, Y., Shang, J., Graham, R., Baric, R. S., & Li, F. (2020). Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus. Journal of virology, 94(7). doi: 10.1128/JVI.00127-20
Data & Resources
- NCBI Open Reading Frame Finder
- ExPASY Translate tool
- Predict Protein Server
- Protein DataBank structure 2AJF
- QHD43416.1: spike protein (Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1)
- spike protein (Bat coronavirus RaTG13)
- spike protein (SARS coronavirus Urbani)
- spike protein (Bat SARS-like coronavirus isolate Rs4231)
- spike protein (Coronavirus BtRs-BetaCoV/YN2018B)
>spike protein (Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1) DNA sequence ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCA ATTACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCA GTTTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATG TCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGC TTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCC CTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCAT TTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAGAGTTTATTCTAGTGC GAATAATTGCACTTTTGAATATGTCTCTCAGCCTTTTCTTATGGACCTTGAAGGAAAACAGGGTAATTTC AAAAATCTTAGGGAATTTGTGTTTAAGAATATTGATGGTTATTTTAAAATATATTCTAAGCACACGCCTA TTAATTTAGTGCGTGATCTCCCTCAGGGTTTTTCGGCTTTAGAACCATTGGTAGATTTGCCAATAGGTAT TAACATCACTAGGTTTCAAACTTTACTTGCTTTACATAGAAGTTATTTGACTCCTGGTGATTCTTCTTCA GGTTGGACAGCTGGTGCTGCAGCTTATTATGTGGGTTATCTTCAACCTAGGACTTTTCTATTAAAATATA ATGAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGACCCTCTCTCAGAAACAAAGTGTACGTT GAAATCCTTCACTGTAGAAAAAGGAATCTATCAAACTTCTAACTTTAGAGTCCAACCAACAGAATCTATT GTTAGATTTCCTAATATTACAAACTTGTGCCCTTTTGGTGAAGTTTTTAACGCCACCAGATTTGCATCTG TTTATGCTTGGAACAGGAAGAGAATCAGCAACTGTGTTGCTGATTATTCTGTCCTATATAATTCCGCATC ATTTTCCACTTTTAAGTGTTATGGAGTGTCTCCTACTAAATTAAATGATCTCTGCTTTACTAATGTCTAT GCAGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATTGCTG ATTATAATTATAAATTACCAGATGATTTTACAGGCTGCGTTATAGCTTGGAATTCTAACAATCTTGATTC TAAGGTTGGTGGTAATTATAATTACCTGTATAGATTGTTTAGGAAGTCTAATCTCAAACCTTTTGAGAGA GATATTTCAACTGAAATCTATCAGGCCGGTAGCACACCTTGTAATGGTGTTGAAGGTTTTAATTGTTACT TTCCTTTACAATCATATGGTTTCCAACCCACTAATGGTGTTGGTTACCAACCATACAGAGTAGTAGTACT TTCTTTTGAACTTCTACATGCACCAGCAACTGTTTGTGGACCTAAAAAGTCTACTAATTTGGTTAAAAAC AAATGTGTCAATTTCAACTTCAATGGTTTAACAGGCACAGGTGTTCTTACTGAGTCTAACAAAAAGTTTC TGCCTTTCCAACAATTTGGCAGAGACATTGCTGACACTACTGATGCTGTCCGTGATCCACAGACACTTGA GATTCTTGACATTACACCATGTTCTTTTGGTGGTGTCAGTGTTATAACACCAGGAACAAATACTTCTAAC CAGGTTGCTGTTCTTTATCAGGATGTTAACTGCACAGAAGTCCCTGTTGCTATTCATGCAGATCAACTTA CTCCTACTTGGCGTGTTTATTCTACAGGTTCTAATGTTTTTCAAACACGTGCAGGCTGTTTAATAGGGGC TGAACATGTCAACAACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACT CAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTG GTGCAGAAAATTCAGTTGCTTACTCTAATAACTCTATTGCCATACCCACAAATTTTACTATTAGTGTTAC CACAGAAATTCTACCAGTGTCTATGACCAAGACATCAGTAGATTGTACAATGTACATTTGTGGTGATTCA ACTGAATGCAGCAATCTTTTGTTGCAATATGGCAGTTTTTGTACACAATTAAACCGTGCTTTAACTGGAA TAGCTGTTGAACAAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCACC AATTAAAGATTTTGGTGGTTTTAATTTTTCACAAATATTACCAGATCCATCAAAACCAAGCAAGAGGTCA TTTATTGAAGATCTACTTTTCAACAAAGTGACACTTGCAGATGCTGGCTTCATCAAACAATATGGTGATT GCCTTGGTGATATTGCTGCTAGAGACCTCATTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACC TTTGCTCACAGATGAAATGATTGCTCAATACACTTCTGCACTGTTAGCGGGTACAATCACTTCTGGTTGG ACCTTTGGTGCAGGTGCTGCATTACAAATACCATTTGCTATGCAAATGGCTTATAGGTTTAATGGTATTG GAGTTACACAGAATGTTCTCTATGAGAACCAAAAATTGATTGCCAACCAATTTAATAGTGCTATTGGCAA AATTCAAGACTCACTTTCTTCCACAGCAAGTGCACTTGGAAAACTTCAAGATGTGGTCAACCAAAATGCA CAAGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTCAAGTGTTTTAAATGATA TCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCACAGGCAGACTTCAAAG TTTGCAGACATATGTGACTCAACAATTAATTAGAGCTGCAGAAATCAGAGCTTCTGCTAATCTTGCTGCT ACTAAAATGTCAGAGTGTGTACTTGGACAATCAAAAAGAGTTGATTTTTGTGGAAAGGGCTATCATCTTA TGTCCTTCCCTCAGTCAGCACCTCATGGTGTAGTCTTCTTGCATGTGACTTATGTCCCTGCACAAGAAAA GAACTTCACAACTGCTCCTGCCATTTGTCATGATGGAAAAGCACACTTTCCTCGTGAAGGTGTCTTTGTT TCAAATGGCACACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACAGACAACA CATTTGTGTCTGGTAACTGTGATGTTGTAATAGGAATTGTCAACAACACAGTTTATGATCCTTTGCAACC TGAATTAGACTCATTCAAGGAGGAGTTAGATAAATATTTTAAGAATCATACATCACCAGATGTTGATTTA GGTGACATCTCTGGCATTAATGCTTCAGTTGTAAACATTCAAAAAGAAATTGACCGCCTCAATGAGGTTG CCAAGAATTTAAATGAATCTCTCATCGATCTCCAAGAACTTGGAAAGTATGAGCAGTATATAAAATGGCC ATGGTACATTTGGCTAGGTTTTATAGCTGGCTTGATTGCCATAGTAATGGTGACAATTATGCTTTGCTGT ATGACCAGTTGCTGTAGTTGTCTCAAGGGCTGTTGTTCTTGTGGATCCTGCTGCAAATTTGATGAAGACG ACTCTGAGCCAGTGCTCAAAGGAGTCAAATTACATTACACATAA
Exploring the Spike protein Structure
- Convert the spike protein DNA sequences into a protein sequence using either the NCBI Open Reading Frame Finder or the ExPASY Translate tool. Paste a screenshot of your results into your Word document (Google Doc).
- How do you know which of the six frames is the correct reading frame (without looking up the answer)?
- Once you answered the question above, you can check your answer with the NCBI protein record.
- Find out what is already known about the spike protein in the UniProt Knowledgebase (UniProt KB). UniProt KB has two parts to it, Swis-Prot, which contains entries for proteins that have been manually reviewed, and TrEMBL (which stands for "Translated EMBL"), which are automated translations of all DNA sequences in the EMBL/GenBank/DDBJ databases. SARS-CoV-2 is so new that it has not yet been added to the UniProt database.
- If you search on the keywords "SARS-CoV" (which refers to the first SARS virus), in the main UniProt search field, how many results do you get?
- Use the entry with accession number "P59594" which corresponds to the reference entry for the SARS-CoV spike protein.
- What types of information are provided about this protein in this database entry?
- We are going to use the PredictProtein server to analyze the SARS-CoV-2 spike protein.
- Paste SARS-CoV-2 spike protein amino acid sequence into the input field and submit.
- Paste a screenshot of the results into your Word/Google Doc. Note that you can zoom in on different parts of the protein by using the slider at the top. Explore the types of information provided (in the menu options at the left). How does this information relate to what is stored in the UniProt database for the SARS-CoV spike protein?
- View the structure of the SARS-CoV spike protein in complex with ACE2 (2AJF) at the Protein DataBank. Click on the "Structure" link underneath the image of the structure on the upper left side of the page. This will open a window where you will be able to interact with the structure image using the menu options on the right and bottom. When answering the following questions, provide a screenshot pasted into your document:
- Locate ACE2 and the spike protein. Create a view that looks like Figure 1A of the Wan et al. (2020) article. You won't be able to color the different parts of the spike protein differently, but you can show the two polypeptides in different colors and rotate them to be the same view as the article.
- Find the N-terminus and C-terminus of each polypeptide tertiary structure.
- Locate all the secondary structure elements. Do these match the predictions made by the PredictProtein server?
- Locate the amino acids on the spike protein from Figure 1B that are deemed critical for the spike-ACE2 interaction. (When you mouse over the structure, it will tell you which amino acid you are pointing to.)
- Locate the amino acids on the ACE2 protein from Figure 4A that are important for the spike-ACE2 interaction.
- What was the most interesting part of doing this exercise?
- What was the most difficult part of doing this exercise?
- What do you want to know more about after doing this exercise?