First, I opened the link to the NCBI website and searched for the Markham, et al article. Once found, I switched the search type to nucleotide and searched again using the same search criteria that had led me to the original paper (Markham RB[AUTHOR]). Then, I selected a gene from subject 2 from the list. I saved it in fasta format.
I navigated to the Biology Workbench site and created an account, then entered the workbench and selected "nucleic tools". I selected "Add New Nucleic Sequence" and hit "Run". I uploaded the fasta file I had created before and saved it to my workbench under the label "S2V4C2".
I then imported the rest of the files that I had saved, and saved them under the same labeling scheme (Subject, Visit, Clone number).
I selected all of the files I'd uploaded into my workbench, chose the option "ClustalW - Multiple Sequence Alignment", and ran it. I submitted it with the default settings.
I clicked "Session Tools" and started a new session.
I selected the following sequences to use for my ClustalW: S7V1-1, S7V1-4, S7V1-8, S15V1-3, S15V1-7, S15V1-3, S13V1-1, S13V1-2, S13V1-3, S1V1-3, S1V1-6, S1V1-9. I ran a clustalW on these sequences, submitting with default settings.
Activity 2, Part 2
I selected the four clones from Subject 13 (S13V1-1, S13V1-2, S13V1-3, S13V1-4), and ran a ClustalW on them using the default settings. I counted the number of differences in the sequence analysis. I then selected the "Import Alignment(s)" button. I selected the alignment I had just imported and ran a "CLUSTALDIST - Generate Distance Matrix with Clustal W" and ran it using the default settings. I used the values from the Clustal Distance Matrix to calculate the min and max difference by multiplying the smallest and largest numbers by the total number of base pairs, respectively.
I repeated the above steps with the 13 clones from subject 1, then the 10 clones from subject 7, then the combined 9 clones of subjects 2 and 4 (6 from 2, 3 from 4), then the combined 15 clones from subjects 5 and 10 (8 from 5 and 7 from 10), then the combined 7 clones from subjects 6 and 12 (3 from 6 and 4 from 12).
Results
Activity 1, Part 3
AF016767.2 and AF016768.2 both came from subject 1, but AF016768.2 is slightly more similar to AF016818.2 than AF016767.2, having 6 differences with AF016767.2 and 5 differences with AF016818.2. AF016767.2 and AF016818.2 are different, having 11 differences between the two of them. AF089153.2 is the most different from all of them, having 21 locations where it is completely different from the other 3, three of them being areas that do not exist in the gene (Fig. 1).
Of the four genes used, three of the four (AF016768.2, AF016767.2, and AF016818.2) have a relatively high degree of similarity, which can be seen in the unrooted tree. Those three are clustered relatively closely together, with AF016768.2 having a closer similarity to AF016818.2 and AF016767.2 than AF016818.2 and AF016767.2 have with each other. AF089153.2 is far away from the others, indicating a large amount of dissimilarity between it and the other three genes (Fig. 2).
Activity 2, Part 1
The most similarity is seen between clones from the same subject, as expected. However, a higher degree of dissimilarity than would be expected is seen among the clones of subject 15, with 24 areas in which 1 clone is different from the rest. Conversely, the clones from subject 13 are extremely similar (which had 2 areas in which 1 clone is different from the rest), as are the clones from subject 1 (which also had 2 areas in which 1 clone is different from the rest. Subject 7 lies in between the group, with 14 areas in which 1 clone is different from the rest (Fig. 3).
The tree confirms what we saw in the sequence alignment. Subject 15 has the highest degree of difference, and its branches are quite far apart. The clones from subject 1 and the clones on subject 13 are extremely similar and are almost entirely overlapping due to their similarity. The clones from subject 7 are not as similar as those from subjects 1 and 13, but not as different as the clones from subject 15 (Fig. 4).
Table 1: The S, θ (theta), min difference, and max difference values from subjects 13, 1, and 7. S is calculated using the number of differences present in the sequence alignment. θ is calculated using the formula [math]\displaystyle{ S/(\sum_{i=1}^{n-1} \frac{1}{i}) }[/math]. The min and max differences are calculated by taking the lowest and highest numbers in the clustal distance matrix respectively and multiplying them by the number of base pairs, which was 285 in every case. The number is then rounded to the nearest integer.
The values varied depending on how similar each subject's clones were to one another. The most similar clones were found among subject 13, and the most diverse among subject 7.
Pair of subjects being compared
Min difference
Max difference
2 and 4
21
26
5 and 10
30
36
6 and 12
25
28
Table 2: The min and max differences from the clones of Subjects 2 and 4, 5 and 10, and 6 and 12. The differences were taken comparing across subjects, not within each subject's clones. The min and max differences were calculated the same was as in Table 1.
The highest difference was seen among Subjects 5 and 10, indicating that their clones were the most different of the group. Subjects 2 and 4 had the least degree of dissimilarity, with a range from only 21-26. Subjects 6 and 12 were inbetween, but only had a difference of 3 bases between their max and min differences.
Questions
Activity 1, Part 2
What was the accession number of the sequence you chose?
The Accession number was AF016818.
Which subject of the study was that HIV sequence from? Which section of the record contains information about who the HIV was collected from?
The sequence was from subject 2. The "Definition" section of the record contained the information about where the HIV was collected from.
Activity 2, Part 1
As a preliminary analysis you should generate a multiple sequence alignment and distance tree for 12 of these sequences (3 clones from each of 4 subjects).
See figures 3 and 4 above in results.
Use the data table below (Table 2) to keep a record of the data you analyzed.
Subject
Clone #
1
3
1
6
1
9
7
1
7
4
7
8
13
1
13
2
13
3
15
3
15
7
15
11
Do the clones from each subject cluster together?
The clones from each subject are in clusters that range from very loose to tight. The clones from subjects 1 and 13 are clustered very close together. The clones from subject 7 are a little farther apart than those seen in 1 and 13, with clone numbers 1 and 4 being more close together than clone 8. However, the clones from subject 15 are very far apart from each other, indicating that they are only very loosely related to one another.
Do some subjects clones show more diversity than others?
Yes. Subject 7's clones shows more diversity than subjects 1 and 13. Subject 15's clones show more diversity than all of the other clones, with each of the branches being very long and far apart.
Write a brief description of your tree and how you interpret the clustering pattern with respect to the similarities and potential evolutionary relationships between the subject's HIV sequences.
My tree contains a relatively short central trunk, with branches of varying length extending off. The branch with subject 13's clones is very distant from the rest, indicating it is the most different from all of the other subject's clones. However, the clones from subject 13 are extremely close together, with clone 2 being located at the branching point in which the rest separate off. This indicates that it may be a common ancestor of the two, or that it may be equally related to both of them. Clone 1 is a very short distance from clone 2, and is located on an extremely tiny branch that is so small that it is hard to see, indicating that it is very closely related to the other clones. Clone 3 is the furthest up the branch, indicating the most diversity. Overall, the clones from subject 13 are very closely related and not very diverse. The next longest branch of the main trunk is Subject 7. Subject 7's clones are much more loosely related than subject 13's. However, the clones are relatively close together, indicating that they are closely related. Clones 1 and 4 are more closely related than clone 8, as clone 8 has a branch to itself while clones 1 and 4 stem off of the same terminal point. The next longest branch off the trunk is Subject 1. Like subject 13, subject 1's clones are very closely related. Clone number 3 is located on the branching point that splits off into clones 6 and 9, indicating that it may be the common ancestor of 6 and 9, and 6 and 9 are also closely related. The final branch is for subject 15, the most dissimilar of the subjects. Each of the clones of subject 15 is located on a longer branch from those around it, indicating that they are not very similar and are much more diverse. Clone 3 is the most diverse of the clones, having diverged first and being on a branch of its own. Clones 7 and 11 split off the same branching point, but their respective branches are very long, indicating that they are not very similar.
Copy and paste your tree from the Biology Workbench into a word processor and print it out to share with the class.
See figure 4 above.
Conclusions
Activity 1, Part 3
In Activity 1, Part 3, four clones were compared, two from subject 1 (AF016768.2 and AF016767.2), one from subject 2 (AF016818.2), and one from Subject 4 (AF089153.2). Unlike what would have perhaps been expected, the two clones from subject 1 were not in fact the most similar; instead, AF016768.2 was most similar to AF016818.2, from subject 2. It is possible that this is due to simply random chance, or perhaps subject's 2 and 1 may be infected with a similar strain of the virus. It is difficult to tell without seeing more clones than just those three, as more clones from subject 2 and 1 would need to be studied and compared to be able to come to any conclusions. The one concrete conclusion that can be reasonably drawn is that the clone from subject 4, AF089153.2, is very dissimilar from the other two clones, and is likely a completely different strain than the other two.
Activity 2, Part 1
In Activity 2, Part 1, the clones from 3 genes each from subjects 1, 7, 13, and 15 were compared. As would have been expected, the highest degree of similarity was seen among the clones of the same subject, and none of the clones were more similar to a clone from a different subject than they were to another clone within their same subject. However, the overall similarity between the clones of an individual subject varied greatly. Subject 13's clones were so similar that each clone differed from another by only 1-2 bases. Subject 1 also had as high of a degree of similarity, with each clone having only a 1-2 base difference from each other clone. Subject 7 had 14 differences between its clones, a much higher amount of dissimilarity than seen in subjects 1 and 13. Subject 15 had the highest amount of difference among its clones, with 24 areas in which one of the clones differed from the rest. The tree (Fig. 4) shows the large amount of space between each of the clones of subject 15, as well as the extremely close relationships between the clones of subjects 1 and 13. There is potentially several reasons for why the difference is seen. It is possible that Subject 15's virus mutates at a much more rapid rate than that of Subjects 1 and 13. It is also possible that the clones chosen were simply more similar to one another than the other clones. It could be a combination of both factors. A ClustalW would need to be run on each of the clones of each respective subject to see what the case ultimately is.
Activity 2, Part 2
Activity 2, Part 2 aimed to numerically display the diversity between the clones of a particular subject, and also between the clones of different subjects. Diversity was measured using S, θ, and the minimum and maximum difference values. Among the three subjects tested (Subjects 13, 1, and 7), specifically chosen because of their respective progressor groups, subject 13 had the lowest diversity. As subject 13 was a nonprogressor, this was expected. The minimum difference was 1, and the maximum was 2. It can therefore be concluded that subject 13's clones were not very diverse. Subject 1 was a rapid progressor, and subject 5 was a moderate progressor. It would have been expected that subject 1 would show the highest amount of diversity. However, this did not ultimately end up being the case. Subject 1 had an S value of 26, while 7 had an S value of 28. Subject 1's θ was \textstyle\frac{55440}{6617}, or about 8.37. Subject 7's θ was \textstyle\frac{70560}{7129}, or about 9.90. The min and max differences for subjects 1 and 7 were 1 and 14, and 2 and 18 respectively. Looking at Figure 1 in Markham, et al's paper, it becomes apparent why the difference arose. Subject 7 was more diverse than subject 1 at the time of the first visit. This also shows why subject 13 was so similar, as its diversity was very low. In the comparisons across subjects, only the min and max differences were focused on. Between the three pairs of subjects tested (2 and 4, 5 and 10, 6 and 12), the most diverse was subjects 5 and 10, and the least diverse was subjects 2 and 4. 5 and 10 were significantly more diverse than the rest, with a min difference value higher than the max difference values of the other two pairs. It is possible that the HIV strains in 5 and 10 are completely different strains entirely that separated long before on the genetic line. While the diversity of 5 and 10 are relatively low at the first visit, they are extremely different from each other, reinforcing this possibility. Subjects 2 and 4 were the least diverse, with a min difference of 21 and a max difference of 26. The two subjects are clearly very different, but not nearly as diverse as 5 and 10. Subjects 6 and 12 are an interesting case, as the min difference is 25, but the max difference is only 28, meaning there is a very small difference between the most different and least different strains in subjects 6 and 12. This could potentially be due to the clones within each subject being relatively similar, so when compared there was very little difference between them. A clustalW could be run to confirm this hypothesis.
Overall Conclusions
In this exercise, the aim was to observe differences between clones of various subjects during the first visit, and eventually learn to quantify the differences between them. In part 3, it was found that one of the analyzed clones from subject 1, AF016768.2, was more similar to a clone from subject 2, AF016818.2, than to the other clone examined from subject 1, AF016767.2, a result which was not expected. This could be due to simply the clones chosen from subjects 1 and 2, or perhaps due to a similarity between the HIV strain in subjects 1 and 2. Further testing would need to be done to confirm what is the case. However, the clone from subject 4, AF089153.2, was significantly different from the rest, as expected. In Activity Two, part 1, 3 clones each from subjects 1, 7, 13, and 15 were examined. The clones within a subject were more similar to one another than any clone from a different subject, as expected. The respective clones from subjects 1 and 13 were extremely similar, with each one having only 2 differences between each of their clones. Conversely, subject 15 had 24 differences between its clones, indicating a much higher degree of diversity present, perhaps due to a higher rate of mutation. Subject 7 had 14 differences, meaning it is intermediate in diversity between subjects 1/13 and 15. Finally, in Activity 2, Part 2, a method was learned to quantify diversity by looking at the S, θ, minimum difference, and maximum difference values. The four clones from subject 13 were found to be very similar, with a minimum difference of only 1 and a maximum difference of 2. Only 3 total differences were observed across the three clones. The other subjects, 1 and 7, were much more diverse, with subject 7 being the most diverse of the two despite its status as a moderate progressor perhaps indicating that the opposite would be true. Subject 1 had a minimum difference of 1 and a maximum difference of 14, whereas subject 7 had a minimum difference of 2 and a maximum difference of 18. Lastly, diversity was quantified between subjects using the minimum and maximum differences. Between the three pairs of subjects studied, subjects 2 and 4 were the least diverse, with a minimum difference of 21 and a maximum difference of 26. Subjects 6 and 12 were in the middle on the scale of diversity, with a minimum difference of 25 and a maximum difference of 28. Subjects 5 and 10 had the highest degree of diversity, with a minimum difference higher than the maximum difference of either of the previous two subjects. The minimum difference was 30, and the maximum difference was 36, indicating an extremely high degree of dissimilarity between the two.
Electronic Lab Notebook
What is your question?
Do clones from a particular subject's group (rapidly progressing, moderately progressing, and nonprogressing), share any genetic similarity with one another? Are clones from subjects in the same group the most similar, or are they as dissimilar as clones from other groups? Does the amount of variation and similarity, if any, change from the time of the first visit to a visit after about 2 years of infection with the virus?
Make a prediction (hypothesis) about the answer to your question before you begin your analysis.
If a subject is in a particular group (rapid, moderate, nonprogressing), their clones will most similar to themselves, but will share a higher degree of similarity with the clones of the subject's individual group than with those in the other groups. The variation between clones of a specific group will change between visits, with the rapid progressors becoming most different from both others and each other, and nonprogressors remaining relatively the same.
Which subjects, visits, and clones will you use to answer your question?
I will be using the following subjects, visits, and clones to answer my question:
Subject 2: Visit 1, clones 3, 4, and 5; Visit 3, clones 5, 6, and 7.
Subject 3: Visit 1, clones 1, 2, and 3; Visit 3, clones 2, 7, and 9.
Subject 6: Visit 1, clones 1, 2, and 3; Visit 5, clones 2, 4, and 8.
Subject 8: Visit 1, clones 1, 3, and 5; Visit 4, clones 4, 5, and 6.
Subject 11: Visit 1, clones 3, 5, and 7; Visit 3, clones 3, 6, and 9.
Subject 12: Visit 1, clones 1, 2, and 3; Visit 5, clones 2, 5, and 10.
Subject 13: Visit 1, clones 1, 3, and 4; Visit 3, clones 1, 2, and 4.
Subject 14: Visit 1, clones 2, 3, and 6; Visit 5, clones 1, 6, and 7.
Subject 15: Visit 1, clones 3, 6, and 12; Visit 4, clones 1, 3, and 4.
This is a total of 9 subjects, 18 visits, and 54 clones.
Three subjects from each progressor group were chosen. Subjects 2, 12, and 13, from the nonprogressor group, were chosen out of necessity, as there were only 3 nonprogressors available. Subjects 6, 8, and 14, from the moderate progressor group, were chosen due to their difference from one another in diversity, divergence, and CD4 T cell count at the conclusion of testing, and also due to them all having a test occurring just around the 2-year mark. Subject 6 had an ultimate increase in CD4 T cell count, making it an ideal subject to choose due to its difference from the rest of the group. Subject 14 had a general increase in diversity before having a sharp decline in diversity, and subject 8 had relatively low diversity and divergence before the midpoint visit, at which point it began to rise until it plateaued at the final visit. These factors made them unique to look at together, alongside the relatively high diversity and divergence of subject 6, as it appears they are quite different from one another. Subjects 3, 11, and 15 were selected from the rapid progressor group for similar reasons. They all had a visit around the 2 year mark, but also were very different from one another in terms of diversity and divergence. This makes it ideal to see whether, despite their differences, there are any underlying similarities between them. Subject 15 had extremely high diversity that spiked rapidly. Subject 11 had consistently upward sloping diversity, but not nearly as dramatic as subject 15, and its divergence was low and rose very slowly. Subject 3 ended up with a decrease in diversity and divergence, overall.
The first visit was selected for each subject to check genetic diversity at the start. The second visit was determined by whatever the "midpoint visit" was considered for each subject, with the exception of subjects 11 and 15. For subject 11, when Figure 1 of the Markham, et all text was studied, visit 3 was found to be closest to the midpoint, so visit 3 was chosen over the midpoint visit, listed as visit 2. Subject 15's visit was moved for the same reason as subject 11. The midpoint visit was listed as visit 3. Visit 4 was ultimately selected due to its location to the 2-year mark, also found in Figure 1 of the Markham, et al paper. Most of these visits occurred around the 2 year mark, with the exception of subject 2, who's second (and in thus, midpoint) visit was halfway through the third year. However, there were no other choices to use for a nonprogressor, as there were only three options to begin with, so subject 2 was selected. Clones were selected at random, so as to prevent bias in selecting for intentionally similar or different clones, as the ultimate goal is to see difference and similarities among any clone in a particular progressor group, and in comparison to the other progressor groups over time.