BIOL368/F14:Nicole Anguiano Week 5

Electronic Lab Notebook

Answering the question: Do clones from a particular subject's group (rapidly progressing, moderately progressing, and nonprogressing), share any genetic similarity with one another? Are clones from subjects in the same group the most similar, or are they as dissimilar as clones from other groups? Does the amount of variation and similarity, if any, change from the time of the first visit to a visit after about 2 years of infection with the virus?

Below are the clones chosen and the subjects/visits they are from:

Progressor Group Subject Visit # Clones Visit # Clones
Rapid 3 1 1, 2, 3 3 2, 7, 9
Rapid 11 1 3, 5, 7 3 3, 6, 9
Rapid 15 1 3, 6, 12 4 1, 3, 4
Moderate 6 1 1, 2, 3 5 2, 4, 8
Moderate 8 1 1, 3, 5 4 4, 5, 6
Moderate 14 1 2, 3, 6 5 1, 6, 7
Nonprogressor 2 1 3, 4, 5 3 5, 6, 7
Nonprogressor 12 1 1, 2, 3 4 2, 5, 10
Nonprogressor 13 1 1, 3, 4 3 1, 2, 4

Methods and Results

• First, I compiled the sequences of each of the clones that I'd specified last week (also listed above) from the bioquest web site. I went to the Biology workbench and created a new session, called "Research". In this session, I uploaded the file I just created with all of the sequences.
Visit 1
• I began by running a CLUSTALW on each of the first visit clones. I ran one on the rapid progressors, then the moderate progressors, then the nonprogressors, importing the alignments each time so that further analysis could be performed on each alignment through CLUSTALDIST.
Figure 1: The visit 1 sequence alignment for subjects 3, 11, and 15 (the rapid progressors), generated by CLUSTALW. The S value in Table 1 was calculated by counting how many places there was a difference between the sequences.
Figure 22: The visit 1 tree for subjects 3, 11, and 15 (the rapid progressors), generated by CLUSTALW.
• The rapid progressors have a total of 85 differences between their collective sequences (S=85, Table 1). Each progressor is most similar to their own clones, as expected, although clone 6 from subject 15 is quite different from the other clones from subject 15. It is possible that the S value is increased due to the large number of differences between subject 15 clone 6 and the rest of the subject 15 clones. The rapid progressors are notable for being the only group in which more than one subject has additional nucleotides not present in the rest in more than one place. This could also be a cause of the very high S value.
Figure 3: The visit 1 sequence alignment for subjects 6, 8, and 14 (the moderate progressors), generated by CLUSTALW. The S value in Table 1 was calculated by counting how many places there was a difference between the sequences.
Figure 4: The visit 1 tree for subjects 6, 8, and 14 (the moderate progressors), generated by CLUSTALW.
• The moderate progressors have a total of 49 differences between their collective sequences (S=49, Table 1). Each progressor is most similar to its own clones, as expected. No one subject has a particularly different clone, unlike the rapid progressors (Fig. 2). Despite also having one area in which one subject has additional nucleotides, the S value is relatively low.
Figure 5: The visit 1 sequence alignment for subjects 2, 11, and 12 (the nonprogressors), generated by CLUSTALW. The S value in Table 1 was calculated by counting how many places there was a difference between the sequences.
Figure 6:The visit 1 tree for subjects 2, 11, and 12 (the nonprogressors), generated by CLUSTALW.
• The nonprogressors have a total of 55 differences between their collective sequences (S=55, Table 1). Each progressor is most similar to its own clones, as expected. No one subject has a particularly different clone, unlike the rapid progressors. There are no areas in which a clone has extra nucleotides or missing nucleotides, so the differences are only in the base 288 bases.
• After running the CLUSTALW, I ran a CLUSTALDIST on each group to find the clustal distance matrix, and calculate the theta, min difference, and max difference (Table 1).
Figure 7: The visit 1 clustal distance matrix for subjects 3, 11, and 15 (the rapid progressors), generated by CLUSTALDIST. The min and max difference in Table 1 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 291.
• The minimum value is 0.135, and the maximum is 0.177. This is used to calculate the minimum and maximum differences in Table 1 using the gene length of 291.
Figure 8: The visit 1 clustal distance matrix for subjects 6, 8, and 14 (the moderate progressors), generated by CLUSTALDIST. The min and max difference in Table 1 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 288.
• The minimum value is 0.049, and the maximum is 0.112. This is used to calculate the minimum and maximum differences in Table 1 using the gene length of 288.
Figure 9: The visit 1 clustal distance matrix for subjects 2, 12, and 13 (the nonprogressors), generated by CLUSTALDIST. The min and max difference in Table 1 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 285.
• The minimum value is 0.077, and the maximum is 0.154. This is used to calculate the minimum and maximum differences in Table 1 using the gene length of 285.
Group S θ Min Difference Max Difference
Rapid Progressors 85 $\textstyle\frac{170}{3}$ 39 52
Moderate Progressors 49 $\textstyle\frac{98}{3}$ 14 32
Nonprogressors 55 $\textstyle\frac{110}{3}$ 22 44
• Table 1: The S, θ, and minimum/maximum differences among the individual progressor groups. Unusually, the moderate progressors are actually the most similar group, the the rapid progressors predictably having the most differences between them. It may have been predicted that the nonprogressors had the most differences, but the moderate progressors came out the most similar, with an the maximum difference of the nonprogressors being 12 over the maximum difference of the moderate progressors.
• After comparing each group with itself, I then compared across groups. I ran a CLUSTALW and CLUSTALDIST on the first visit clones from the three rapid progressors and the three moderate progressors, then the three rapid progressors and the three nonprogressors, then the three moderate progressors and the three nonprogressors. After the comparison, I ran a CLUSTALW and CLUSTALDIST on all of the visit 1 clones.
Figure 10: The visit 1 sequence alignment for the rapid and moderate progressors, generated by CLUSTALW.
Figure 11:The visit 1 tree for the rapid and moderate progressors, generated by CLUSTALW.
Figure 12:The visit 1 clustal distance matrix for the rapid and moderate progressors, generated by CLUSTALDIST. The min and max difference in Table 2 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 291.
• There is actually not as large of a difference between the rapid and moderate progressors as would have been expected. Subjects 14 and 15 seem more similar than either of their rapid or moderate counterparts, respectively. Subject 11 remains vastly different than the other two, while subjects 8 and 6 are relatively similar. The lowest value in the clustal distance matrix was 0.064, and the highest was 0.191, which was used to calculate the min and max differences in table 2. Comparisons across the clustal distance matrix was done between groups and not within them, so only rapid progressors were compared with moderate progressors and moderate progressors were compared with rapid progressors. This comparison scheme will remain for the remainder of the clustal distance matrices.
Figure 13: The visit 1 sequence alignment for the rapid and nonprogressors, generated by CLUSTALW.
Figure 14:The visit 1 tree for the rapid and nonprogressors, generated by CLUSTALW.
Figure 15:The visit 1 clustal distance matrix for the rapid and nonprogressors, generated by CLUSTALDIST. The min and max difference in Table 2 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 291.
• As may have been expected, the rapid and nonprogeressors are quite different, much moreso than the moderate and rapid progressors. No close similarities exist like in subjects 14 and 15 (Fig. 11). The nonprogressors appear different from both each other and the rapid progressors, and the rapid progressors appear both different from each other and the nonprogressors. The lowest value in the clustal distance matrix was 0.082, and the highest was 0.186, which was used to calculate the min and max differences in table 2.
Figure 16: The visit 1 sequence alignment for the moderate and nonprogressors, generated by CLUSTALW.
Figure 17:The visit 1 tree for the moderate and nonprogressors, generated by CLUSTALW.
Figure 18:The visit 1 clustal distance matrix for the moderate and nonprogressors, generated by CLUSTALDIST. The min and max difference in Table 2 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 288.
• Subject 6 and 8 from the moderate progressors are relatively similar, but there do not exist any close similarities between the moderate and the nonprogressors. Overall, the moderate and nonprogressors are relatively different and don't share a large amount of differences. The lowest value in the clustal distance matrix was 0.074, and the highest was 0.165, which was used to calculate the min and max differences in table 2.
Figure 19:The tree for the all three progressor groups for visit 1, generated by CLUSTALW.
Figure 20:The visit 1 clustal distance matrix for all three progressor groups for visit 1, generated by CLUSTALDIST. The min and max difference in Table 2 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 291.
• Unusually, the two most similar subjects were 14 and 15, one of which was a moderate progressor and one of which was a rapid progressor, respectively. The next most similar are 6 and 8, both moderate progressors. Outside of those, the subjects were all relatively different. The lowest value in the clustal distance matrix was 0.071 and the highest was 0.186.
Groups Being Compared Min Difference Max Difference
Rapid and Moderate 19 56
Rapid and Nonprogressor 24 54
Moderate and Nonprogressor 21 48
All 21 54
• Table 2: The minimum and maximum differences between the progressor groups for the first visit. Strangely, the rapid and moderate groups have both the lowest minimum difference and the highest maximum difference, indicating the largest range of difference between the two of them. The rapid progressors are more similar to the moderate progressors than the are to each other, having a lower minimum difference between the two than just within the rapid progressors. However, they also have a higher maximum difference, indicating that there are clones that are more varied. The rapid progressors and the nonprogressors are more different than either of their two respective groups, though again the minimum difference is lower than it is just among the rapid progressors. The moderate progressors and nonprogressors are more different overall, with the minimum difference being only one lower than the minimum difference of the nonprogressors.
Mid-Visit
• I moved from the first visit clones to the mid-visit clones, and ran a CLUSTALW on each of the mid-visit clones. I ran one on the rapid progressors, then the moderate progressors, then the nonprogressors, importing the alignments each time so that further analysis could be performed on each alignment through CLUSTALDIST.
Figure 21: The midvisit alignment for subjects 3, 11, and 15 (the rapid progressors), generated by CLUSTALW. The S value in Table 3 was calculated by counting how many places there was a difference between the sequences.
Figure 222: The midvisit tree for subjects 3, 11, and 15 (the rapid progressors), generated by CLUSTALW.
• The rapid progressors have a total of 83 differences between their collective sequences (S=83, Table 3). Each progressor is most similar to their own clones, as expected, with the three clones from subject 11 being extremely similar. Unlike the first visit, none of the subjects have one clone that is very different from the rest. This removes the possibility that the extreme difference of one clone is a significant cause of the high S value, and indicates the the rapid progressors are simply more different than one another than the rest of the groups.
Figure 23: The midvisit sequence alignment for subjects 6, 8, 14 (the moderate progressors), generated by CLUSTALW. The S value in Table 3 was calculated by counting how many places there was a difference between the sequences.
Figure 242: The midvisit tree for subjects 6, 8, 14 (the moderate progressors), generated by CLUSTALW.
• The moderate progressors have a total of 50 differences between their collective sequences (S=50, Table 3). Each progressor is most similar to their own clones, as expected. Clones 4 and 8 from subject 6 are more different than the rest (as well as more different than what was seen in the visit 1 clones), but not significantly.
Figure 25: The midvisit alignment for subjects 2, 12, 13 (the nonprogressors), generated by CLUSTALW. The S value in Table 3 was calculated by counting how many places there was a difference between the sequences.
Figure 262: The midvisit tree for subjects 2, 12, 13 (the nonprogressors), generated by CLUSTALW.
• The moderate progressors have a total of 50 differences between their collective sequences (S=50, Table 3). Each progressor is most similar to their own clones, as expected. The subject 13 clones are extremely similar, as are the clones of subject 12, with the clones of subject 2 being more diverse than the rest. However, overall, they are all quite similar.
• After running the CLUSTALW, I ran a CLUSTALDIST on each group to find the clustal distance matrix, and calculate the theta, min difference, and max difference (Table 3).
Figure 27: The midvisit clustal distance matrix for subjects 3, 11, and 15 (the rapid progressors), generated by CLUSTALDIST. The min and max difference in Table 3 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 291.
• The minimum value is 0.142, and the maximum is 0.177. This is used to calculate the minimum and maximum differences in Table 3 using the gene length of 291.
Figure 28: The midvisit clustal distance matrix for subjects 6, 8, and 14 (the moderate progressors), generated by CLUSTALDIST. The min and max difference in Table 3 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 288.
• The minimum value is 0.042, and the maximum is 0.112. This is used to calculate the minimum and maximum differences in Table 3 using the gene length of 288.
Figure 29: The midvisit clustal distance matrix for subjects 2, 12, and 13 (the nonprogressors), generated by CLUSTALDIST. The min and max difference in Table 3 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 285.
• The minimum value is 0.077, and the maximum is 0.158. This is used to calculate the minimum and maximum differences in Table 3 using the gene length of 285.
Group S θ Min Difference Max Difference
Rapid Progressors 83 $\textstyle\frac{166}{3}$ 41 52
Moderate Progressors 50 $\textstyle\frac{100}{3}$ 12 32
Nonprogressors 58 $\textstyle\frac{116}{3}$ 22 45
• Table 3: The S, θ, and minimum/maximum differences among the individual progressor groups. Again, the moderate progressors are the most similar of the group. The rapid progressors are the most diffierent, with the minimum and maximum differences only differing by a value of 9. The nonprogressors, again, are in the middle, but have the largest range, with the minimum and maximum differences differing by 23.
• After comparing each group with itself, I then compared across groups. I ran a CLUSTALW and CLUSTALDIST on the mid-visit clones from the three rapid progressors and the three moderate progressors, then the three rapid progressors and the three nonprogressors, then the three moderate progressors and the three nonprogressors. After the comparison, I ran a CLUSTALW and CLUSTALDIST on all of the mid-visit clones.
Figure 30: The midvisit sequence alignment for the rapid and moderate progressors, generated by CLUSTALW.
Figure 31:The midvisit tree for the rapid and moderate progressors, generated by CLUSTALW.
Figure 32:The midvisit clustal distance matrix for the rapid and moderate progressors, generated by CLUSTALDIST. The min and max difference in Table 4 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 291.
• Again, subjects 14 and 15 seem more similar than either of their rapid or moderate counterparts, respectively. Subjects 8 and 6 are relatively similar, again. Subjects 3 and 11 are quite different from all the subjects. The lowest value in the clustal distance matrix was 0.060, and the highest was 0.184, which was used to calculate the min and max differences in table 4.
Figure 33: The midvisit sequence alignment for the rapid and nonprogressors, generated by CLUSTALW.
Figure 34:The midvisit tree for the rapid and nonprogressors, generated by CLUSTALW.
Figure 35:The midvisit clustal distance matrix for the rapid and nonprogressors, generated by CLUSTALDIST. The min and max difference in Table 4 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 291.
• It doesn't seem as though there are any significant similarities between and of the rapid progressors or nonprogressors. Subjects 2 and 15 are the most similar among the groups, but the similarity is relatively small. The lowest value in the clustal distance matrix was 0.078, and the highest was 0.179, which was used to calculate the min and max differences in table 4.
Figure 36: The midvisit sequence alignment for the moderate and nonprogressors, generated by CLUSTALW.
Figure 37:The midvisit tree for the moderate and nonprogressors, generated by CLUSTALW.
Figure 38:The midvisit clustal distance matrix for the moderate and nonprogressors, generated by CLUSTALDIST. The min and max difference in Table 4 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 288.
• While the moderate and nonprogressors seem more closely related than the rapid and nonprogressors, there still are no significant simiarities. The most closely related are subjects 6 and 8, from the moderate progressors. The lowest value in the clustal distance matrix was 0.067, and the highest was 0.158, which was used to calculate the min and max differences in table 4.
Figure 39:The midvisit tree for the all three progressor groups for the mid-visit, generated by CLUSTALW.
Figure 40:The midvisit clustal distance matrix for all three progressor groups for the mid-visit, generated by CLUSTALDIST. The min and max difference in Table 4 was calculated by comparing across subjects and seeing what the highest and lowest values were, then multiplying by 291.
• Subjects 6 and 8, as well as subjects 14 and 15, were again the most similar. The former two were from the moderate progressors, and the latter two were from the moderate and rapid progressor groups respectively. The lowest value in the clustal distance matrix was 0.070 and the highest was 0.179, which was used to calculate the min and max differences in table 4.
Groups Being Compared Min Difference Max Difference
Rapid and Moderate 17 54
Rapid and Nonprogressor 23 52
Moderate and Nonprogressor 19 46
All 20 52
• Table 4: The minimum and maximum differences between the progressor groups for the middle visit. The rapid and moderate groups have both the lowest minimum difference and the highest maximum difference again. The trends from table 2 remained, with the rapid and nonprogressors being in the middle, the moderate and nonprogressors being on the bottom (despite having a lower minimum difference than the rapid and nonprogressors), and the rapid and moderate progressors having the highest maximum difference.

Group(s) Visit Min Difference Max Difference
Rapid 1 39 52
Rapid Mid 41 52
Moderate 1 14 32
Moderate Mid 12 32
Nonprogressor 1 22 44
Nonprogressor Mid 22 45
Rapid and Moderate 1 19 56
Rapid and Moderate Mid 17 54
Rapid and Nonprogressor 1 24 54
Rapid and Nonprogressor Mid 23 52
Moderate and Nonprogressor 1 21 48
Moderate and Nonprogressor Mid 19 46
All 1 21 52
All Mid 20 52
• Table 5: Comparison of all the min and max differences from tables 1, 2, 3, and 4.

Conclusions

After testing, I have come to the conclusion that my hypothesis, stated here, was false. The moderate and nonprogressors, for the most part, shared the highest amount of genetic similarity within their own groups. The rapid progressors, however, had a high level of difference within their own group, and were overall much more similar to other groups than to their own. That being said, they often had a lower maximum difference within their own group than when compared to the other groups, though the minimum difference was overall much higher within its own group then when compared to the others (15-22 higher within the rapid progressors). This indicates that the rapid progressors were uniformly different from one another. Unusually, subject 14, one of the moderate progressors, and subject 15, one of the rapid progressors, were actually very similar when viewed on the trees (Fig. 11 and Fig. 31) in both visits, which was unexpected. While they were slightly more different on the middle visit, overall they shared much more similarity than subject 15 did to either of the other rapid progressors, and than subject 14 did to either of the other moderate progressors. This may indicate that they were infected with a similar strain, but does not give any answers as to why subject 14 was only a moderate progressor and subject 15 was a rapid progressor. Unusually, the least amount of difference was actually seen in the moderate progressors as opposed to the nonprogressors, which may have been expected. The moderate progressors actually decreased in minimum difference at the middle visit, while the nonprogressors increased by one in maximum difference at the middle visit. The rapid progressors had a higher minimum difference at their midvisit than at their first visit. Overall, it can be concluded that while genetic similarity may play a role in the difference between progressor groups, it will take a much larger sample to be able to come to any distinct conclusions about the behavior of the virus in each of the progressor groups. At the moment, with the data possessed, it seems that it is relatively random whether an HIV-infected person will become a rapid progressor or not, though there may be a similarity between the viruses of the moderate progressors that could be studied further. Also, it seems that the progression of the virus doesn't change from the time of infection to the point two years following infection. A larger data sample with a greater number of timepoints would be necessary to be able to reach any further conclusions.

In regards to other papers, it doesn't appear that any of them actively studied anything similar to what I had examined in this study.