phylogenetic analysis, utility
Part 1: Bird microbiome analysis
You will take several steps to analyze your bird stool sequencing data, as individuals, pairs, and ultimately the entire class:
- For each ### clone of yours (e.g., #716-1 through #716-8), you will trim and combine the forward and reverse sequencing results to get one intact 16S rRNA gene.
- For each sequence, you will use BLAST to determine the closest known bacterial species to that sequence.
- You will post both the sequences and a summary of the species that you found, according to a specific template.
- You will then pair with your ### clone partner (see the [| Day 2 Talk page]) to align all your sequences, up to 16 of them, in a program called MEGA, and to subsequently construct a phylogenetic tree.
- These trees will be posted so that cross-class comparisons can be made.
- At a later time (e.g., at the Thursday office hour), you will ideally apply the MEGA alignment and tree construction too all samples for a given gull, up to 32 of them. We will figure out a fair way to divide and share this work, so that no one person has to analyze 256 sequences on his or her own!
- Finally, you may compare the MA versus AK trees by inspection, as some of them will be pretty homogeneous, or you may optionally run a UniFrac analysis. Some guidance about UniFrac will be posted later this week.
Part A: Understand possible insert orientations within vector
- Recall from Day 2 the sequences of the forward and reverse primers used to broadly amplify bacterial 16S rRNA genes:
- Forward: 5' AGAGTTTGATCCTGGCTCAG
- Reverse: 5' ACGGGCGGTGTGTACA
- Based on these sequences, you might expect that your insert will always begin with "AGA" and always end with "CGT." (Draw a picture to make sure you understand why the last three bases are as they are written here.)
- However, in blunt-end cloning, the insert -- here our PCR product -- can face in either orientation. Take a moment to figure out what other basepairs you might expect to see at the beginning or end of your sequenced insert. For now, pretend that you are using forward sequencing primers only, which will read out the coding strand.
- The kind of cloning we are doing is called non-directional cloning. Directional cloning is possible when, for example, two different restriction enzymes are used to create overhangs that are complementary to the vector but not to each other.
Part B: How to download a sequence
- The data from Genewiz is available at the company website, linked here.
- Choose the "Login" link and then use "email@example.com" and "be20109" to log in.
- At the bottom right should be a section called Recent Results. Click on More to expand it, and then click the icon under the Results column for your particular plate.
- T/R orders were placed on 2/27, and W/F orders were placed on 02/28.
- T/R Blue, your last two samples were moved to Plate 3, and T/R Platinum, so were yours. I had to move these because wells 95 and 96 are used by Genewiz for controls.
- The quickest way to start working with a particular sequence is to follow the "View" link under the Seq File heading. For ambiguous data, you may want to look directly at the Trace File as well.
Part C: Prepare sequences for analysis
- Begin by downloading this file, which contains the DNA sequence of the vector we are using in GenBank format. Open the file in ApE (A plasmid Editor, created by M. Wayne Davis at the University of Utah), which is found on your desktop. Three items of interested are highlighted: the forward priming site, the reverse priming site and the two basepairs between which your sequence should be inserted.
- Follow the steps below for each clone that had successful forward and reverse sequencing reactions. In cases where only one reaction was successful, briefly check whether you can locate an insert. However, note that there is a known problem with this cloning procedure wherein sometimes an incomplete vector (with no insert and also missing a chunk of the vector) is returned. You should also scroll down to the bottom to check if any of your failed reactions were repeated; these are noted with an "R" and in some cases worked the second time around.
- Paste the forward sequence of your first candidate into a new ApE file. Locate where the vector ends and the insert begins; trim away the vector.
- While it is easiest to find the insert by doing Edit → Find (or Apple-F) using the base pairs right before the insert should begin, note that the string "CCC" may be mis-sequenced as "CC" or "CCCC" because long stretches of the same base (particularly Gs and Cs) are prone to error.
- Paste the reverse sequence of your first candidate into yet another ApE file. Immediately use Edit → Reverse Complement to adjust the sequence, and again trim away the vector.
- Why is it more convenient to work with the reverse complement when sequencing from the reverse direction?
- In ApE, use Tools → Align Sequence to find where the forward and reverse sequences overlap. Combine them into one sequence with no repeated parts; where both forward and reverse sequence have coverage of the gene, choose whatever combination has the fewest Ns (ideally none!). Save this sequence as a new file called YourTeamDay-YourTeamColor_YourSampleID-"C"Candidate Number (e.g., WF-Purple_737-C1).
- You may find it easiest to print out the alignment in order to choose where to switch from using forward to using reverse sequence.
- In pilot testing, we have run into one case in which the forward and reverse sequences have almost no overlap. It's not clear what caused this error. Before assuming that this error has struck your data, too, be sure that you reverse-complemented your reverse sequence!
- Finally, depending on the orientation of your insert, you may want to reverse complement the entire sequence. Use the original sequences of the forward and reverse 16S primers to guide your decision.
- You must now save each sequence in .txt format. If anyone can figure out how to do this task directly in ApE, let us know! Otherwise, you can copy-paste the sequence into a program such as TextEdit, choose File → Save, and in the pulldown menu select Plain Text.
Part D: Identify species from sequences
Align with "nucleotide blast" from NCBI
- The alignment program can be accessed through the NCBI BLAST page or directly from this link.
- Paste the sequence text that you prepared above into the "Query" box. If there were ambiguous areas of your sequencing results, these will be listed as "N" rather than "A" "T" "G" or "C" and it's fine to include Ns in the query.
- Under "choose search set," select "16S ribosomal RNA sequences (Bacteria and Archaea)" from the database pulldown menu.
- Click on the BLAST button. Matches will be shown by vertical lines between the aligned sequences, while mismatches and gaps will be shown with a dash.
- Because this gene is highly conserved, a number of species should come up as highly matched. However, one should (usually) be a best choice. Write down this strain and its accession number, its associated max score, query coverage, and max identity; write down these parameters for the second most closely matched species if it is close. Particularly if there are Ns in your sequence, you might need to scroll down and closely observe the aligned sequences to truly know which is best.
- You should print a screenshot of each [OR MAYBE JUST ONE EXAMPLE?] alignment to pdf (and to paper if you desire). These will be used to prepare a figure showing what you found today. You might want to email yourself the alignment screen shots or post them to your wiki userpage.
Part D: Align sequences and construct tree
Part 2: Microsporidia primer analysis
For next time
write something here or not accessible to edit