# BIOL368/F20:Week 4

BIOL368-01: Bioinformatics Laboratory

Loyola Marymount University

Fall 2020

This journal entry is due on Thursday, October 1, at 12:01am Pacific time.

## Overview

The learning objectives for this assignment are:

• Learn how to obtain sequence data, compare it using multiple sequence alignments, and analyze it for phylogenetic relationships using trees.

## Individual Journal Assignment

### Homework Partners

• You will be expected to consult with your partner, in order to complete the assignment.
• Each partner must submit his or her own work as the individual journal entry (direct copies of each other's work is not allowed).
• You must give the details of the interaction with your partner in the Acknowledgments section of your journal assignment.
• Homework partners for this week are:
• Fatimah & Kam
• Nathan & Macie
• Aiden & Anna
• JT & Yaniv
• Owen & Ian
• Taylor & Nida

### Format and Content Checklist

1. Store this journal entry as "username Week 4" (i.e., this is the text to place between the square brackets when you link to this page).
2. Write something in the summary field each time you save an edit. You are aiming for 100%.
3. Invoke the template that you made as part of the Week 1 assignment on your individual page. Your template should contain:
• A link to the template page itself.
• A list or table of all of the Assignment pages for the course.
• A list or table of all of your individual journal pages for the course.
• A list or table of all the shared class journal pages for the course.
• The category "BIOL368/F20".
4. Purpose: a statement of the scientific purpose of the assignment. Note that this is different than the learning objective stated on the assignment page. What science will be discovered by completing this assignment?
5. Combined Methods/Results (Electronic Lab Notebook): documentation of your workflow for this exercise. It should include:
• The protocol you followed in enough detail for someone else to be able to conduct the same investigation. There should be enough detail provided so that you or another person could re-do it based solely on your notebook. You may copy protocol instructions on your page and modify them as to what you actually did, as long as you provide appropriate attribution.
• Answers to any specific questions posed in the exercise.
• Data and files: links to all data and files used and generated.
• It is a critical skill for data and computer literacy to back-up your data and files in at least two ways:
• References to data and files should be made within the methods and results section. In addition to these inline links, create a "Data and Files" section of your notebook to make a list of the files generated in this exercise.
6. Scientific Conclusion: a summary statement of the main result of exercise/research. It should mirror the purpose. Length should be 2-3 sentences, up to a paragraph.
7. Acknowledgments section (see Week 1 assignment for more details.)
• You must acknowledge your homework partner with whom you worked, giving details of the nature of the collaboration. You should include when and how you met and what content you worked on together.
• Acknowledge anyone else you worked with who was not your assigned partner. This could be the instructor, the TA, other students in the class, or even other students or faculty outside of the class.
• If you copied wiki syntax or a particular style from another wiki page, acknowledge that here. Provide the user name of the original page, if possible, and provide a link to the page from which you copied the syntax or style.
• If you copied any part of the assignment or protocol and then modified it, acknowledge that here and also include a formal citation in the Reference section.
• You must also include this statement:
• "Except for what is noted above, this individual journal entry was completed by me and not copied from another source."
• Sign your Acknowledgments section with your wiki signature (four tildes, ~~~~).
8. References section (see Week 1 assignment for more details.)
• Use the APA format.
• Cite this assignment page.
• Cite any protocols that you copied and modified (this must also be noted in the Acknowledgments section).
• Cite any other methods, software, websites, data, facts, images, documents (including the scientific literature) that was used to generate content on your page.
• Do not include extraneous references that you do not cite or use on your page.

### Protocols

#### Part 1: GenBank

In this section you will take a closer look at a GenBank record and the type of data that is stored there. Once you reach the sequence data associated with the Wan et al. (2020) paper you will see that there are a variety of different ways to view the data.

Choose one of the GenBank records from the Data & Resources section above and view both the full record and the FASTA formatted sequence.

• What was the accession number of the sequence you chose?
• What information is provided in the GenBank record?
• Click the Send to link in the upper right of the page. Select Complete Record, File as the Destination, and FASTA as the format. Click the Create File button. Be careful to remember where you put the file and what you name it so that you can find it later.
• Open the file that you saved with a word processor to confirm that you have the sequence and that it is in the FASTA format. In the FASTA format each sequence is preceded by a label which begins with the greater than sign (>). For example, the first 10 lines of the SARS-CoV-2 sequence is:
>MN908947.3 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA
CGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAAC
TAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTG
TTGCAGCCGATCATCAGCACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTC
CCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTAC
GTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGG
CTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAAACGTTCGGAT
GCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTC
GTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCT
TCTTCGTAAGAACGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTA
...continued

• While we could create a phlylogenetic tree with the entire genome sequence of the viruses, in this analysis we are mainly interested in the spike protein. Links have been provided to the individual spike protein sequences corresponding to each of the viral genome records listed in the Data & Tools section. We are going to "crowdsource" gathering the sequence data from 12 other viral strains that are listed in Figure 2 of Wan et al. (2002).
• Each student will be assigned a nucleotide sequence accession number from Figure 2 in class.
• Search for the GenBank record associated with that sequence. Add a hyperlink to the GenBank record to the list of sequences in the Data & Tools section.
• Locate the spike protein accession number in the GenBank record. (Note that the spike protein is sometimes called the "S" protein.)
• Add a hyperlink to the spike protein record to the list of sequences in the Data & Tools section. Be sure to format the list in the same way as it is already formatted.
• Download your assigned protein sequence in FASTA format, just like you did for the whole genome sequence.
• Note that if you begin a line with a space character, it will be interpreted as a fixed width font and the sequences will like up nicely on the page.
• Also add the protein sequence to the talk page for this assignment. We will be creating a list of sequences for everyone in the class to use.

#### Part 2: Creating a phylogenetic tree with Phylogeny.fr

In order to analyze sequence data we will use the Phylogeny.fr, a free, simple to use web service dedicated to reconstructing and analysing phylogenetic relationships between molecular sequences.

1. In your browser, go to the website www.phylogeny.fr. Scroll down on the page to the section labeled ‘Phylogeny analysis’, and click on the text ‘One Click’.
2. Click in the large text field labeled ‘Upload your set of sequences in FASTA, EMBL, or NEXUS format’. Copy the list of sequences from the talk page and use Ctrl-V (or command-V) to paste your sequences here, then click the “Submit” button.
3. You will see a page named Alignment results. After your alignment is complete, you will see a new page named Phylogeny results. Finally, you will see a page named Tree rendering results. You will come back to these pages later. For now, find the numbered tabs located just beneath the text One Click Mode, and click on the tab labeled 3. Alignment.
• Within the alignment, individual positions are color-coded to indicate their conservation, or how similar the sequences are to each other at that position. Blue highlighting indicates high conservation (i.e., the sequences are identical or at least very similar), while gray highlighting indicates lower conservation and white highlighting indicates little if any conservation.
4. Near the bottom of the page, under Outputs, click on Alignment in Clustal format. This will display your alignment in a text-only format in which each position's conservation is indicated by a symbol underneath the alignment block (“*” for invariant, “:” for highly conserved, “.” for weakly conserved, and a space for not conserved). Copy and paste this entire alignment into your individual journal entry. Use the space character at the beginning of each line so that the sequence lines up properly on your page.
5. Now go back and click on the tab 6. Tree Rendering, and you will see a phylogenetic tree of the five sequences.
• On this tree, horizontal lines (branches) represent individual evolutionary lineages. By contrast, vertical lines (splits) represent mutation events, and the vertical length of each split is drawn purely for visual clarity with no biological meaning. The left-most split is called the root of the tree, and represents a hypothesis about the most recent common ancestor (MRCA) of the sequences within your tree.
• In Figure 2 of Wan et al. (2020), an outgroup called BtSCoV PDF2386 is used. However, I was unable to find this sequence in GenBank for us to use. Instead, the sequences from Figure 3C, MERS-CoV and HKU4 are provided, which essentially create two outgroups.
• The length of each branch represents the percentage change in amino acid sequence occurring along that branch, relative to the scale bar shown at the bottom of the tree. The scale bar will be a number between 0 and 1 and can be reinterpreted as a percent. For example, 0.05 would be 5%. The tree may also contain support values for each clade; shown in red on the branches, also expressed as a number between 0 and 1. 0.05 would be 5%. In general, a higher support value indicates a higher statistical confidence in a particular clade.
• Save the image to a file, upload it to the wiki, and display it on your individual journal page.
6. Compare the tree to the multiple sequence alignment. See if you can relate the differences in the sequences to the topology of the tree diagram. Describe the relationship in your individual journal page.
7. Relate your alignment to Figure 3 of the Wan et al. (2020) paper.
• Find the amino acid sequences that are highlighted in the figure and mentioned in the text in your alignment. You may find it helpful, once you find it to copy and paste just that portion of the alignment into a new section of your individual journal page.
• What are the similarities and differences between your alignment and the one shown in Figure 3?
8. Compare your tree to Figure 2 of the Wan et al. (2020) paper.
• What are the similarities and differences between your alignment and the one shown in Figure 2?
9. Is enough information provided by Wan et al (2020) in their paper for us to reproduce their analysis? Explain your answer.

## Shared Journal Assignment

• Compose your journal entry in the shared Class Journal Week 4 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first :) )
• Sign your portion of the journal with the standard wiki signature shortcut (~~~~).