BIOL368/F14:Isabel Gonzaga Week 4

From OpenWetWare
Jump to navigationJump to search

Exploring HIV Evolution In-Class Activity

Methods

Activity 1 // Part 2: GenBank

A search was conducted in the GenBank database within the NCBI website for the Markham et al. (1998) HIV-1 sequences. A sequence was selected at random to analyze the full record and FASTA formatted sequence.
5 sequences were then selected from the Summary search view and downloaded in a single file in FASTA format. This file was then opened on Microsoft Word to screen for errors.

Activity 1 // Part 3: Introduction to Biology Workbench

The 5 randomly selected sequences from Activity 1 Part 2 were uploaded and saved onto Biology Workbench under the Nucleic tools tab. The 5 sequences were then selected and ClustalW was used to perform a multiple sequence alignment.

Activity 2 // Part 1: Looking at Clustering Across Subjects

The provided Visit 1 Sequence Files were downloaded and separately uploaded onto Biology workbench. ClustalW was used to perform a multiple sequence alignment and distance tree for 12 sequences: S1V1-1, S1V1-2, S1V1-3, S2V1-1, S2V1-2, S3V1-3, S3V1-1, S3V1-2, and S3V1-3. This used three clones from each of the first four subjects. The unrooted trees generated were analyzed to view potential evolutionary relationships.

Activity 2 // Part 2: Quantifying Diversity Within and Between Subjects

ClustalW was used to perform multiple sequence alignment for all clones from subject 1. S value was determined by counting the number of positions of nucleotide difference. Theta was determined by dividing the subject's S-value by the harmonic sum of n-1, where n equals the number of clones. The alignment was then imported for further analysis.
Under the alignment tab on Biology Workbench, the alignment for Subject 1 was selected and the Clustaldist tool was used to generate a distance matrix. The maximum and minimum percentage values were then determined within this matrix, and multiplied by the total number of base pairs (285), to determine the raw difference. This process was repeated for the clones for subjects 2 and 3.

Differences between subjects were then compared. Under the Nucleic Tools tab, every sequence for subjects 1 and 2 were selected for ClustalW multiple sequence alignment. This alignment was then imported, and a Clustaldist was performed. The minimum and maximum percentage values were determined and multiplied by the total base pairs (285) to determine the raw minimum and maximum differences between subjects. This process was repeated to compare Subjects 1 and 3, and then 2 and 3.

Results

Activity 1 // Part 2: GenBank

  1. Accession Number: AF089142
  2. Subject: 3; Determined by 'Definition' section of record

Activity 1 // Part 3: Introduction to Biology Workbench

Activity 2 // Part 1: Looking at Clustering Across Subjects

Unrooted tree of HIV-1 viral strains for subjects 1,2,3 and 4 for visit 1‎

  1. Yes, the clones from each subject cluster together.
  2. Subject 3 shows some viral diversity within it's clones, as Clone 3 as the node separating clone 3 from clones 1 and 2 is fairly large, in comparison to node lengths in other strains.
  3. Viral clones for subjects 1 and 2 clustered together, showing similar genetic viral identities between the two subjects
  4. Based on the length of the lines, Subject 3 and Subject 4 are genetically distinct, whereas Subject 1 and 2 are more similar. Subject 3 shows the greatest distance from the rest of the sequences, implying greater divergence. Additionally, as a large node separates Clone 3 from 1 and 2 in Subject 3, subject 3 maintains the greatest diversity of viral HIV-1 clones represented in the graph. The clustering of subjects 1 and 2 may indicate infection by the same (or similar) strains of the HIV-1 virus.

Activity 2 // Part 2: Quantifying Diversity Within and Between Subjects

Table 1. Clustadist Analysis of Distance Within Subjects

Subject Number of Clones S Theta Min Difference Max Difference
1 13 26 8.4 0 14
2 6 5 2.4 1 3
3 4 6 2.6 1 5



Table 2. Clustadist Analysis of Distance Between Subject Pairs

Subjects Compared Min Difference Max Difference
1 & 2 1 11
1 & 3 34 41
2 & 3 35 40

Conclusion

Biology Workbench was used to analyze and compare sequence data from HIV-1 strains from Markham et al. (1998)'s article. The rootless phylogenetic tree displays the evolutionary relationships between three viral clones taken from subjects 1, 2, 3 and 4 at the time of their first visit. Through rootless trees, conclusions may be drawn on the genetic distances between sequences. Based on the length of the lines, Subject 3 and Subject 4 are genetically distinct, whereas Subject 1 and 2 are more similar. Subject 3 shows the greatest distance from the rest of the sequences, implying greater difference in sequence. Additionally, as Subject 3 maintains a large node separating Clone 3 from 1 and 2, subject 3 maintains the greatest diversity of viral HIV-1 clones. The clustering of subjects 1 and 2 may indicate infection by the same (or similar) strains of the HIV-1 virus. This is supported by the Distance analysis, as a small range of 1-11 base pair differences were found in comparing the clones from the two subjects. As these viruses were sequenced from the point of the first visit (after initial seroconversion), these strains are likely to be more similar to the initial forms of the viruses introduced, as they have had less time to respond and mutate in response to selection factors. Clustadist Analysis was also used to examine viral diversity between and within subjects. This showed subject 1 to have the highest divergence between it's viruses. As Subject 1 was a Rapid Progressor, this may have implications related to the increase in diversity over time for the subgroup. Another Rapid Progressor, subject 3, showed the next highest divergence at a maximum of 5 nucleotide differences, while Subject 2 (a non-progressor) showed the lowest max difference of 3 nucleotides. The analysis also showed Subject 1 to be most similar to Subject 2, a suggested by the unrooted phylogenetic trees, with a maximum difference of 11 nucleotides between sequences. These findings also confirm Subject 3 to have the greatest genetic difference from Subjects 1 and 2 with minimum differences between 34-35 nucleotides and maximum differences between 40-41. It is additionally important to note the differences in data sets for each subject, which may add bias in the interpretation of results. Subject 1 had 13 clones, whereas Subject 2 had 6 and Subject 3 only had 4.

Defining Your Research Project

  1. What is your question?
    • Is there a relationship between the genetic identities in the viral strains present in Subjects with or progressing towards AIDS diagnosis, compared to subjects who are not?
  2. Make a prediction (hypothesis) about the answer to your question before you begin your analysis.
    • I predict that in comparing and analyzing subjects of these three categories (diagnosed with AIDS, trending towards AIDS and no trend), higher diversity will be found in the AIDS diagnosed and in those trending towards AIDS. I also hypothesize that the 'no trend' group will maintain some genetic similarity across viruses.
  3. Which subjects, visits, and clones will you use to answer your question?
    • Subjects with AIDS
      • Subject 3
        • S3V1 (1-3), S3V6 (3,5,6)
      • Subject 10
        • S10V1 (3,5,7), S10V6 (4,6,8)
      • Subject 15
        • S15V1 (6,9,12), S15V4 (2,6,8)
    • Subjects Progressing Towards AIDS
      • Subject 8
        • S8V1 (1,2,4), S8V7 (3,5,7)
      • Subject 9
        • S9V1 (2-4), S9V8 (2,4,8)
      • Subject 14
        • S14V1 (2,5,6), S14V9 (3,6,9)
    • Subjects Not Progressing
      • Subject 5
        • S5V1 (2,4,6), S5V5 (1,3,4)
      • Subject 6
        • S6V1 (1-3), S6V8(4,6,8)
      • Subject 13
        • S13V1 (2-4), S13V5 (1,2,4)
  1. Justify why you chose the subjects, visits, and clones you did.
    • A subject is defined at having AIDs if their CD4 T Cell levels drop below 200/µL. Based on this criterion, subjects were chosen based on trends in CD4 T Cell levels from Figure 1 in the Markham et al (1998) article. Subjects were chosen for having multiple data points, and having an overall steady rate of change in T Cell levels.
    • Visits were chosen based on the First visit, and Last Visit, as provided by the BEDROCK HIV Problem database. The First Visit allows us to consider the starting point for the HIV-1 virus, at the initial point of seroconversion. This comparison will potentially show if subjects were infected by genetically similar viruses. The last visits will serve as the basis to determine the relationship of the virus identities at the end of the study. As the status of each subject at the end of the study determines their categorization (of AIDS diagnosed, progressing towards AIDS or no progression), the relationship between their categorization and genetic makeup may be analyzed.
    • Three clones were chosen from random for each visit (with the exception of S6V1, where only 3 clones were available in the data set). This serves to prevent bias in this analysis. Furthermore, the number of clone sequences used from each visit and for each subject were made constant, also to prevent bias. As a total of 54 clone sequences were used, better conclusions may be drawn because of the large dataset.

Weekly Assignments

Class Journals

Electronic Lab Notebook