Angela A. Garibaldi Week 4

=Exploring HIV-1 Exercises=

Part 2: GeneBank

 * 1) Search NCBI site under GeneBank - Nucleotide with one of the codes used in the Markham paper for the subject's sequence of the env gene.
 * 2) The Accession number of the S14V4-8 sequence is GenBank: AF089617.1
 * 3) This first HIV sequence is from Subject 14 from Visit 4, Variant 8. The name code of the sequence in this case indicates who the HIV was collected from.
 * 4) Click on FASTA to get the FASTA format
 * 5) Copy and paste 6 sequences into a .txt file for easy access. In this way the entire descriptive name of the sequences along with their accession numbers are listed.

[[Media:S14_Subject.txt |List of (6) Subject 14 Sequences]]

Part 3: Introduction to the Biology Workbench

 * 1) Register for Biologyworkbench and log in.
 * 2) Click on the Nucleic Tools button on the bottom of the page
 * 3) Select Add new sequence from the list and click run.
 * 4) Select Choose file and upload file to upload the FASTA sequence file you created in Part 2.
 * 5) Use the NEXT and PREVIOUS links on the top to toggle through each sequence you included in your .txt file or simply scroll down.
 * 6) SAVE these
 * 7) Now go back to the list of Nucleic Tools and choose ClustalWtool and select the saved sequences
 * 8) Run the Tool on these sequences and click submit. This will result in a sequence alignment and a phylogenetic tree.
 * My tree looks very funny because the sequences I chose are all from the same subject, only from different visits. However, I probably should have put my sequences in a more logical order by visit number. Although the mutations are all lined up to be in the same base pair locations along the sequences.

Part 1: Looking at clustering across subjects

 * 1) Upload two txt files, one with all of the first visit sequences from Subjects 1-9, and the second with the same information about Subjects 10-15. (Two separate files due to the max upload of 64 sequences at once).
 * 2) This sequencing information is accessed via the link under the Bedrock Link on the HIV assignment link.
 * 3) Once all of these sequences are uploaded, do a practice tree in which you generate a multiple sequence alignment and disance tree for 12 sequences (3 clones from each of 4 subjects)
 * 4) Record the date of the clones you used in the chart provided in the handout.

Phylogenetic Tree of Subjects 15,12,10,8


The clones from each subject do cluster together. Subject 15 shows the most diversity in that each clone's branch is long when considering its connection to another clone, implying that they are less genetically similar. Furthermore, only clones 10 and 11 meet at a common node. Clone 12 diverges significantly which makes sense in relation to the point made by the Markham paper; Rapid Progressors show more diversity among viral clones. Considering that this was only the first visit, it makes it more clear that in this rapid progressor, patterns of higher diversity and divergence are already prominent.

Subjects' 8 and 10 clones are very closely related within each subject as their branches are very short, close, and stem from a common node. Subject 8 is a moderate progressor and subject 10 is a rapid progressor. Considering the progression of these subjects' HIV the tree for these specific clones does not show the diversity and divergence depicted in the Markham paper. However, considering that this is only the first visit, this tree sets the stage for potentially showing an increase in divergence and diversity over time. Subject 12, a non-progressor has two clones that are closely related and one that is quite different. Clones 2 and 3 have very short branches, are therefore highly similar, and fall into the Markham conclusion that non-progressors show less diversity and divergence. However, clone 4 diverges early on, defying the Markham conclusion.

In my particular tree, the subjects do not cluster together. All subjects with their respective clones are relatively distant from others. However, subjects' 10 and 15 viral clones are more closely related. This makes sense in that they are both rapid progressors, implying that the viral clones of rapid progressors are more closely related to each other than to those of other progressor groups.

Calculating S

 * S - statistic can be used to quantify diversity of sequence in a population.
 * S - value is the number of positions that vary, or are not identical, across all sequences in an alignment.
 * 1) Select all clones from one subject and align them.
 * 2) Calculate S by counting number of positions where there is at least one nucleotide difference across the collection of clones. Enter data into Table 3.
 * 3) Import your alignment so you can use it in a later section.
 * 4) Run the same analysis for a second and third subject and record in Table 3.

Calculating Theta

 * Theta is an estimate of the average pairwise genetic distance. It does not take into account the frequency of the different clones in the sample.
 * Theta= S/ partial sum of N-1 where N is the number of clones.

Calculating Minimum and Maximum

 * Minimum and Maximum pairwise distances look only at the extremes in terms of similarity and difference.
 * 1) Use Clustdist took in the alignment tool set to generate a distance matrix.
 * 2) Select the highest and lowest pairwise scores and convert that percentage difference score into the raw number of differences by multiplying by the length of the sequences. Round to the nearest integer

Comparing Maximum and Minimum Differences Across Subjects

 * 1) Create an alignment with all of the sequences from 2 subjects
 * 2) Clustdist to generate pairwise distance matrix for the alignment across subjects. Find min.,max. differences between the subjects sequences. Look only at pairwise.
 * 3) Repeat for all pairings.