Bobak Seddighzadeh Week 4

Activity 1
Part 2: GenBank


 * I downloaded four sequences in FASTA format to my local hard drive by selecting several at the same time in summary view.
 * The accession number of the sequence I chose was: AF089154
 * Which subject of the study was that HIV sequence from? Which section of the record contains information about who the HIV was collected from? The HIV sequence was collected from subject 4. The sequence has a code that contains information about who the HIV was collected from. The code for this subject's particular sequence is S4V2-4 which indicates the clone is from subject four on his second visit, and its his fourth clone.
 * These are the sequences I chose: [[Media:HIVSequences1.txt|HIV Sequences]]

Part 3: Introduction to the Biology Workbench
 * To analyze the sequences I used Biology Workbench
 * 1) Log onto Biology Workbench
 * 2) Use nucleic sequence data
 * 3) Select Add new sequences, press run; then upload your sequences
 * 4) Select all your sequences and run a mucltiple sequence alignment using ClustalW

Activity 2
In this activity, I will use data sequences from fifteen individuals that participated in a study in Baltimore. I will proceed to characterize the poopulations of HIV within an individual and quantify the differences between the individuals. I will use ClustalW tool to build my alignments and distance based unrooted trees.

Part 1:


 * Access sequencing information from Bedrock link on HIV assignment.
 * Upload visit one from subjects 1 through 15 into the nucleic acid tool set of Biology Workbench.
 * Preliminary analysis generate multiple sequence alignment and distance tree for 12 of these sequences
 * Record the data of your clones in Table 2
 * The following sequences were analyzed using ClustalW: Subject 1 clones 1,2,3; Subject 8 clones 1,2,3; Subject 5 clones 1,2,3; Subject 15 clones 1, 2, 3.
 * Do the clones from each subject cluster together? Yes, the clones from subject 1, 5, 8, and 15 are all clustered together.
 * Do some subjects' clones show more diversity than others? Yes, subject 15's clones show a great deal of diversity in relation to the other subject's clones that I sequenced because subject 15's branches are furthest apart in relation to the other subject's clones indicating a greater genetic distance. In class last week, we covered the paper and discovered that subject 15 is a rapid progressor. Thus, it would be expected that subject 15 would exude a great deal of genetic divergence amongst its clones. Subject 8 appears to have the least genetic divergence amongst its clones because its branches are closest together, and subject 5 and 1 appear to only have two terminal nodes which from my interpretation indicates that two of the three clones might exhibit little to no genetic divergence.
 * Do some of the subjects cluster together? None of the subjects cluster together. However, there is a relatively short internal branch between the four subjects which can suggest that they may not all be too genetically far away from one another. Subject 1 and 15 are relatively closer to one another than in comparison to subject 5 and 8. This can suggest the possibility that subject 1 and 15 are more closely related to each other than subject 5 and 8. The same applies for subject 5 and 8 in relation to 1 and 15
 * My interpretation of similarities and potential evolutionary relationships between the sequences: Since we can use genetic distances as a first order approximation of the evolutionary relationships, my analysis is based upon reflecting the genetic distances between each sequence. I feel that subject 1 has HIV clones that came first on the evolutionary path because it has the shortest proximity and it gave rise to subject 15. Also, I feel that subject 5 and 8's clones proceeded after subject 1 and 15 because they have longer branches. I feel that Subject 5 and 8 began evolving around a similar time.

Part 2

Calculating the S value: S value "quantifies the diversity of sequence in a population" (143 Donovan).
 * Select all the clones form one subject and align them. From the alignment calculate S by counting the number of positions where there is at least one nucleotide difference across the collection of clones.
 * Import your alignment

Calculating Theta: Estimate of the average pairwise genetic distance
 * theta= S/ partial sum of N-1 where N is the number of clones.

Investigate Minimum and maximum differences between any pair of sequences in an alignment. This compares and contrasts extremes
 * Use Clustdist took in the alignment tool set to generate a distance matrix.
 * Select the highest and lowest pairwise scores and convert that percentage difference score into the raw number of differences by multiplying by the length of the sequences. Round to the nearest integer

Look at maximum and minimum differences across subjects
 * Create an alignment with all of the sequences from 2 subjects
 * Clustdist to generate pairwise distance matrix for the alignment across subjects. Find min.,max. differences between the subjects sequences. Look only at pairwise.
 * Repeat the analysis for all pairings.