Mking44 Week 13

From OpenWetWare
Jump to navigationJump to search


Individual Journal Entries

Class Journal Entries


  • The purpose of this week's assignment was to further investigate the SARS-CoV-2 S spike protein and use databases such as Uniprot and Predict Protein server to look more closely at the sequence of amino acids and see how the actual sequence affects the function and structure of the protein. Then, using 3D viewers allowed one to visualize the Spike glycoprotein and locate amino acids or sequences on the actual structure to correlate how they relate to each other. This allows one to start their own research project on structure function relationships of the S protein.

Combined Methods/Results

Exploring the Spike Protein Structure

  • The DNA sequence of the SARS-CoV-2 S protein from Wuhan-Hu-1 isolate is found on the Week 13 page.
  1. DNA sequence was converted to protein sequence using the ExPASY Translate tool
    • First, sequence was copied and pasted into DNA box on the page. Then, the tool spit out multiple reading frames. The first reading frame is seen below.
    • You know that the first reading frame is the right one because it starts with M (AUG is the start codon in a sequence) and it is read 5' to 3'.

  • To double check this prediction, the NCBI protein record was used.
  • By clicking on the link and the FASTA button to reveal the protein sequence, it is shown that these two sequences match and the correct reading frame was selected.
  1. The S protein was investigated using UniProt Knowledgebase (UniProt KB).
    • First, SARS-CoV was inserted into the search bar (due to how recent SARS-CoV-2) is, and 818 results came up.
    • Then, the entry with the ascension number P59594 was used to analyze the S protein.
      • In the database, there is information about the S1 and S2 subunits of the S protein. It also has links to different Gene Ontology terms describing its function and biological process such as 'host cell surface receptor binding' and 'receptor-mediated virion attachment to host cell'. It also includes some taxonomy information, virus host, structure, and different proteomes of the S protein. It shows different types of mutations of the S protein and how it affects its function. Overall, it has a lot of information about the S protein!
  2. Then, PredictProtein server was used to analyze the SARS-CoV-2 spike protein.
    • First the sequence from Walls et. al was inserted into the PredictProtein server. The results are shown below.

  • By using some of the analysis tools on the website, it shows that 50% of the secondary structure are loops, 27% B-sheets, and 20% helices. Unlike alpha helices and B-sheets, loops are irregular secondary structure and are related with binding and enzyme sites. Also the results say that 57% of them are buried, which emphasizes the different conformation states of the S protein. It also reveals a lot of protein disorder within the overall amino acid sequence, which relates with the unfavorable reactions in some of the papers we heard about. Lastly, it shows a lot of places where point mutations could have a major effect as well as 17 predicted binding sites in the S protein, which is not too far away from in the paper where they said that 14 known binding sites are essential.
    • Some of this information agrees with the Uniprot information, such as where the B sheets and alpha helices are and some effects of point mutations. However, since the Uniprot database does not have the SARS-CoV-2 protein sequence yet, the amino acids are somewhat off since in the paper, they observed the insertion of the 4 A.A. residue that the others SARS viruses do not have.
  1. Next, the actual 3D structure of the S glycoprotein was observed using the Protein Data Bank. The S protein closed structure will be observed: 6VXX.
    • The 'Structure link' was opened (found under the picture of the 3D image) and the JSMol viewer was used to view the structure (the viewer can be changed on the bottom right)
    • iCn3D can be also used in order to better identify the N and C terminus and overall compare the two viewers.
  • Comparison of the S protein structure using cryo-EM from the paper and 3D viewer on Protein Data Bank can be shown below:

  • Left: Walls. et al. 2020. Right: Protein Data Bank 3D structure viewer
  • N and C terminus of each polypeptide tertiary structure is identified using the Rainbow color scheme on Protein Data Bank.

  • N terminus to C terminus: dark blue --> red
  • Secondary structures of SARS-CoV-2 were identified using the 3D viewer on Protein Data Bank. Image is shown below:

  • Pink: alpha-helix, Yellow: beta-strand, Blue: beta-turn, White: coil.
  • These secondary structures correspond to the Predict Protein server since most of the protein is comprised of blue and yellow (loops), and the rest are split between beta sheets and alpha helices.
  • Amino acids discussed in the paper were able to be seen in the Cn3D viewer by highlighting specific regions in the sequence.

  • In the red box, there is a strand of highlighted amino acids. This is a region where 9 out of 14 amino acids essential for binding to ACE2 are located. It makes sense why they are at the very top of the structure because this is the S1 subunit. The S1 subunit is protruding out of the cell membrane and contains the receptor binding domain.

Research Proposal

  1. What question will you answer about sequence-->structure-->function relationships in a SARS-CoV-2 protein?
    • How does SARS-CoV-2 interact with other ACE2 orthologues with species not found in the Wan 2020 paper?
  2. What sequences will you use? (A multiple sequence alignment will be performed).
    • Species that have the ACE2 receptor are: Human, Mice, Rat, Civet, Bat, Monkey, Dog, Cat, Bovine, Yeast, Horse, and Chicken. Therefore we can analyze the sequences of the ACE2 receptors in most of these species and determine if any significant amino acid changes are observed which can change the structure-function relationship of ACE2 bound to SARS-CoV-2.
  3. What protein tools will you use for analysis and answering your question?
    • We will use Uniprot in order to obtain the ACE2 receptor amino acid sequences of these species. Then, we will use in order to do a phylogenetic tree of these species based on the protein sequences. Then we will do a multiple sequence alignment in order to determine important residue changes. Then we will find these residue changes on the 3D viewer of the protein and determine how that might affect the S protein binding to it.

Data and Files

Scientific Conclusion

The SARS-CoV-2 Spike protein was investigated using several different databases and servers in order to better understand the structure/function relationships. Uniprot, a well known protein database, gave information such as Gene ontological terms, mutations, and secondary structure. On the other hand, PredictProtein tries to predict all of that information just by analyzing the protein sequence. Most of the information hypothesized by PredictProtein was very close to the known information about the S protein. This shows how just by having the protein sequence, a lot can be predicted such as the secondary structures, lethal mutations, and binding sites. Lastly, the S protein was observed by using 3D viewers such as Cn3D and the Protein Data Bank viewers. Different settings can be modified such as the different color schemes, highlighting particular amino acids or sequence, and different perceptive views. In conclusion, by using these databases and servers, one better understands the structure-function relationships of a particular protein in interest.


  • I copied and modified the protocol from Week 13 assignment
  • My homework partners for this week are Maya and Karina
  • I asked Maya for help on how to highlight specific sequences on the 3D viewer of the S protein.
  • Dr. Dahlquist, Maya and I were on a Zoom call addressing questions about the final project on April 21st.
  • Except for what is noted above, this individual journal entry was completed by me and not copied from another source.

Mking44 (talk) 12:34, 19 April 2020 (PDT)


  1. Cn3D. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2020 – [cited 2020 Apr 22]. Available from:
  2. ExPASy - Translate tool (2020). SIB Swiss Institute of Bioinformatics. Retrieved April 22, 2020, from
  3. OpenWetWare. (2020). BIOL368/S20:Week 13. Retrieved April 19, 2020, from
  4. Rose et al. (2018) NGL viewer: web-based molecular graphics for large complexes. Bioinformatics doi:10.1093/bioinformatics/bty41
  5. PredictProtein (2020). RostLab. Retrieved April 22, 2020, from
  6. PDB ID: 6VXX. Walls et al., Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell. DOI: 10.1016/j.cell.2020.02.058
  7. Structure [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2020 – [cited 2020 Apr 22]. Available from:
  8. UniProtKB - P59594 (2020). National Center for Biotechnology Information (NCBI). Retrieved April 22, 2020, from