# BIOL368/S20:Week 13

BIOL368-01: Bioinformatics Laboratory

Loyola Marymount University

Spring 2020

This journal entry is due on Thursday, April 23, at 12:01am Pacific time.

## Overview

The learning objectives for this assignment are:

• Learn several ways to analyze and visualize protein structures.
• Ask your own questions and develop your own hypotheses to explain sequence-->structure-->function relationships in the SARS-CoV-2 spike protein (or other protein).

## Individual Journal Assignment

### Homework Partners

• You will be expected to consult with your partners, in order to complete this assignment and the final project and presentation.
• Each partner must submit his or her own work as the individual journal entry (direct copies of each other's work is not allowed).
• You must give the details of the interaction with your partner in the Acknowledgments section of your journal assignment.
• Homework partners for this week and the remainder of the semester are:
• Annika, Christina, Sahil
• Carolyn, Jenny, Nathan
• Drew, Jack, Nick

### Format and Content Checklist

1. Store this journal entry as "username Week 13" (i.e., this is the text to place between the square brackets when you link to this page).
2. Write something in the summary field each time you save an edit. You are aiming for 100%.
3. Invoke the template that you made as part of the Week 1 assignment on your individual page.
4. Combined Methods/Results (Electronic Lab Notebook): documentation of your workflow for this exercise. It should include:
• The protocol you followed in enough detail for someone else to be able to conduct the same investigation. There should be enough detail provided so that you or another person could re-do it based solely on your notebook. You may copy protocol instructions on your page and modify them as to what you actually did, as long as you provide appropriate attribution.
• Answers to any specific questions posed in the exercise.
• Data and files: links to all data and files used and generated.
• Files left on the Desktop or My Documents or Downloads folders on the Seaver 120 computers will be deleted upon restart of the computers. Files stored on the T: drive will be saved. However, it is not a good idea to trust that they will be there when you next use the computer.
• Thus, it is a critical skill for data and computer literacy to back-up your data and files in at least two ways:
• References to data and files should be made within the methods and results section. In addition to these inline links, create a "Data and Files" section of your notebook to make a list of the files generated in this exercise.
5. Scientific Conclusion: a summary statement of the main result of exercise/research. It should mirror the purpose. Length should be 2-3 sentences, up to a paragraph.
6. Acknowledgments section (see Week 1 assignment for more details.)
• You must acknowledge your homework partner with whom you worked, giving details of the nature of the collaboration. You should include when and how you met and what content you worked on together.
• Acknowledge anyone else you worked with who was not your assigned partner. This could be the instructor, the TA, other students in the class, or even other students or faculty outside of the class.
• If you copied wiki syntax or a particular style from another wiki page, acknowledge that here. Provide the user name of the original page, if possible, and provide a link to the page from which you copied the syntax or style.
• If you copied any part of the assignment or protocol and then modified it, acknowledge that here and also include a formal citation in the Reference section.
• You must also include this statement:
• "Except for what is noted above, this individual journal entry was completed by me and not copied from another source."
• Sign your Acknowledgments section with your wiki signature (four tildes, ~~~~).
7. References section (see Week 1 assignment for more details.)
• Use the APA format.
• Cite this assignment page.
• Cite any protocols that you copied and modified (this must also be noted in the Acknowledgments section).
• Cite any other methods, software, websites, data, facts, images, documents (including the scientific literature) that was used to generate content on your page.
• Do not include extraneous references that you do not cite or use on your page.

### Spike Protein Structure Exercise

This exercise is based on the four papers that were assigned for journal club for the Week 11 assignment:

#### Data

>spike protein (Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1) DNA sequence
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCA
ATTACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCA
GTTTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATG
TCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGC
TTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCC
CTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCAT
TTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAGAGTTTATTCTAGTGC
GAATAATTGCACTTTTGAATATGTCTCTCAGCCTTTTCTTATGGACCTTGAAGGAAAACAGGGTAATTTC
AAAAATCTTAGGGAATTTGTGTTTAAGAATATTGATGGTTATTTTAAAATATATTCTAAGCACACGCCTA
TTAATTTAGTGCGTGATCTCCCTCAGGGTTTTTCGGCTTTAGAACCATTGGTAGATTTGCCAATAGGTAT
TAACATCACTAGGTTTCAAACTTTACTTGCTTTACATAGAAGTTATTTGACTCCTGGTGATTCTTCTTCA
GGTTGGACAGCTGGTGCTGCAGCTTATTATGTGGGTTATCTTCAACCTAGGACTTTTCTATTAAAATATA
ATGAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGACCCTCTCTCAGAAACAAAGTGTACGTT
GAAATCCTTCACTGTAGAAAAAGGAATCTATCAAACTTCTAACTTTAGAGTCCAACCAACAGAATCTATT
GTTAGATTTCCTAATATTACAAACTTGTGCCCTTTTGGTGAAGTTTTTAACGCCACCAGATTTGCATCTG
TTTATGCTTGGAACAGGAAGAGAATCAGCAACTGTGTTGCTGATTATTCTGTCCTATATAATTCCGCATC
ATTTTCCACTTTTAAGTGTTATGGAGTGTCTCCTACTAAATTAAATGATCTCTGCTTTACTAATGTCTAT
GCAGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATTGCTG
ATTATAATTATAAATTACCAGATGATTTTACAGGCTGCGTTATAGCTTGGAATTCTAACAATCTTGATTC
TAAGGTTGGTGGTAATTATAATTACCTGTATAGATTGTTTAGGAAGTCTAATCTCAAACCTTTTGAGAGA
GATATTTCAACTGAAATCTATCAGGCCGGTAGCACACCTTGTAATGGTGTTGAAGGTTTTAATTGTTACT
TTCCTTTACAATCATATGGTTTCCAACCCACTAATGGTGTTGGTTACCAACCATACAGAGTAGTAGTACT
TTCTTTTGAACTTCTACATGCACCAGCAACTGTTTGTGGACCTAAAAAGTCTACTAATTTGGTTAAAAAC
AAATGTGTCAATTTCAACTTCAATGGTTTAACAGGCACAGGTGTTCTTACTGAGTCTAACAAAAAGTTTC
TGCCTTTCCAACAATTTGGCAGAGACATTGCTGACACTACTGATGCTGTCCGTGATCCACAGACACTTGA
GATTCTTGACATTACACCATGTTCTTTTGGTGGTGTCAGTGTTATAACACCAGGAACAAATACTTCTAAC
CAGGTTGCTGTTCTTTATCAGGATGTTAACTGCACAGAAGTCCCTGTTGCTATTCATGCAGATCAACTTA
CTCCTACTTGGCGTGTTTATTCTACAGGTTCTAATGTTTTTCAAACACGTGCAGGCTGTTTAATAGGGGC
TGAACATGTCAACAACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACT
CAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTG
GTGCAGAAAATTCAGTTGCTTACTCTAATAACTCTATTGCCATACCCACAAATTTTACTATTAGTGTTAC
CACAGAAATTCTACCAGTGTCTATGACCAAGACATCAGTAGATTGTACAATGTACATTTGTGGTGATTCA
ACTGAATGCAGCAATCTTTTGTTGCAATATGGCAGTTTTTGTACACAATTAAACCGTGCTTTAACTGGAA
TAGCTGTTGAACAAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCACC
AATTAAAGATTTTGGTGGTTTTAATTTTTCACAAATATTACCAGATCCATCAAAACCAAGCAAGAGGTCA
TTTATTGAAGATCTACTTTTCAACAAAGTGACACTTGCAGATGCTGGCTTCATCAAACAATATGGTGATT
GCCTTGGTGATATTGCTGCTAGAGACCTCATTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACC
TTTGCTCACAGATGAAATGATTGCTCAATACACTTCTGCACTGTTAGCGGGTACAATCACTTCTGGTTGG
ACCTTTGGTGCAGGTGCTGCATTACAAATACCATTTGCTATGCAAATGGCTTATAGGTTTAATGGTATTG
GAGTTACACAGAATGTTCTCTATGAGAACCAAAAATTGATTGCCAACCAATTTAATAGTGCTATTGGCAA
AATTCAAGACTCACTTTCTTCCACAGCAAGTGCACTTGGAAAACTTCAAGATGTGGTCAACCAAAATGCA
CAAGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTCAAGTGTTTTAAATGATA
TCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCACAGGCAGACTTCAAAG
TTTGCAGACATATGTGACTCAACAATTAATTAGAGCTGCAGAAATCAGAGCTTCTGCTAATCTTGCTGCT
ACTAAAATGTCAGAGTGTGTACTTGGACAATCAAAAAGAGTTGATTTTTGTGGAAAGGGCTATCATCTTA
TGTCCTTCCCTCAGTCAGCACCTCATGGTGTAGTCTTCTTGCATGTGACTTATGTCCCTGCACAAGAAAA
GAACTTCACAACTGCTCCTGCCATTTGTCATGATGGAAAAGCACACTTTCCTCGTGAAGGTGTCTTTGTT
TCAAATGGCACACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACAGACAACA
CATTTGTGTCTGGTAACTGTGATGTTGTAATAGGAATTGTCAACAACACAGTTTATGATCCTTTGCAACC
TGAATTAGACTCATTCAAGGAGGAGTTAGATAAATATTTTAAGAATCATACATCACCAGATGTTGATTTA
GGTGACATCTCTGGCATTAATGCTTCAGTTGTAAACATTCAAAAAGAAATTGACCGCCTCAATGAGGTTG
CCAAGAATTTAAATGAATCTCTCATCGATCTCCAAGAACTTGGAAAGTATGAGCAGTATATAAAATGGCC
ATGGTACATTTGGCTAGGTTTTATAGCTGGCTTGATTGCCATAGTAATGGTGACAATTATGCTTTGCTGT
ATGACCAGTTGCTGTAGTTGTCTCAAGGGCTGTTGTTCTTGTGGATCCTGCTGCAAATTTGATGAAGACG
ACTCTGAGCCAGTGCTCAAAGGAGTCAAATTACATTACACATAA


#### Exploring the Spike protein Structure

1. Convert the spike protein DNA sequence into a protein sequence using either the NCBI Open Reading Frame Finder or the ExPASY Translate tool. Paste a screenshot of your results into your wiki.
• How do you know which of the six frames is the correct reading frame (without looking up the answer)?
• Once you answered the question above, you can check your answer with the NCBI protein record.
2. Find out what is already known about the spike protein in the UniProt Knowledgebase (UniProt KB). UniProt KB has two parts to it, Swis-Prot, which contains entries for proteins that have been manually reviewed, and TrEMBL (which stands for "Translated EMBL"), which are automated translations of all DNA sequences in the EMBL/GenBank/DDBJ databases. SARS-CoV-2 is so new that it has not yet been added to the UniProt database; it is scheduled to be added with the April 22 release.
• If you search on the keywords "SARS-CoV" (which refers to the first SARS virus), in the main UniProt search field, how many results do you get?
• Use the entry with accession number "P59594" which corresponds to the reference entry for the SARS-CoV spike protein.
3. We are going to use the PredictProtein server to analyze the SARS-CoV-2 spike protein.
• Paste the SARS-CoV or SARS-CoV-2 spike protein amino acid sequence that was discussed in your paper into the input field and submit (Ask Dr. Dahlquist for help with finding the sequence if you need it.)
• Paste a screenshot of the results into your wiki. Note that you can zoom in on different parts of the protein by using the slider at the top. Explore the types of information provided (in the menu options at the left). How does this information relate to what is stored in the UniProt database for the SARS-CoV spike protein?
4. View the structure of the SARS-CoV or SARS-CoV-2 spike protein from your assigned journal club article at the Protein Data Bank.
• Walls et al. (2020): 6VXX or 6VYB
• Wan et al. (2020): 2AJF
• Wrapp et al. (2020): 6VSB
• Yan et al. (2020) 6M17
• Click on the "Structure" underneath the image of the structure on the upper left side of the page. This will open a window where you will be able to interact with the structure image.
• At the bottom right of the screen, you will see a drop-down menu that says "Select a different viewer". Select "JSMol" or "NGL" to access a palette of options for viewing the structure (they are slightly different from each other).
• Alternately, NCBI has a web-based structure viewer called iCn3D. To use it:
• Go to the NCBI Structure home page and paste the PDB ID for your structure (above) into the search field and click "search".
• On the results page, you will see an image of the protein structure. Click on the button for the "full-featured 3D viewer" found on the bottom right corner of the structure image.
• This will open up a window with a similar image viewer to the one found on PDB. The interface is a little clunkier and the images are less elegant, but the iCn3D viewer has the advantange of showing the structure in a way that makes it easier to identify the N- and C-terminus (there are arrows for the beta strands and the helices are also pointed). It also allows you to see the sequence at the same time and find and highlight particular sequences in the structure.
• If you would prefer to work offline, you can install the stand-alone Cn3D viewer on your own computer.
• When answering the following questions, provide a screenshot pasted into your wiki:
• Create a view of the protein from your paper recreates one of the figures from your paper. You may not be able to get this to be exactly the same in terms of colors or backbone style, but you should try to rotate the to be the same view as the article.
• Find the N-terminus and C-terminus of each polypeptide tertiary structure.
• Locate all the secondary structure elements. Do these match the predictions made by the PredictProtein server?
• Locate particular amino acids that were discussed in your paper, show a screenshot that highlights them.

1. What question will you answer about sequence-->structure-->function relationships in a SARS-CoV-2 protein?
• Based on the papers you read for journal club, it will likely be easiest to work on the spike protein or the ACE2 protein. However, if you are interested in one of the other SARS-CoV-2 proteins that are potential drug targets, you can focus on that.
• If you have a different type of SARS-CoV-2 question that you want to answer, I am open to that, as long as it involves some type of data analysis component. Please schedule an office hour appointment ASAP to talk to me about it, though.
2. What sequences will you use? I want you to take advantage of sequence data available to perform a multiple sequence alignment as part of your project.
3. What protein tools will you use for analysis and answering your question?

I will approve all project questions. If you want me to approve your project in advance of next week's class so that you can start work, please schedule an office hour appointment with me to discuss it.

Don't forget that you all already asked some coronavirus questions on our shared page. I'm slowly posting answers, so check it out.

## Shared Journal Assignment

• Compose your journal entry in the shared Class Journal Week 13 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first :) )
• Sign your portion of the journal with the standard wiki signature shortcut (~~~~).