# AninditaVarshneya BIOL368 Week 4

## Electronic Lab Notebook

### Purpose

This activity is meant to teach us about S values, theta values, and other statistical values, as well as how to analyze the genetic differences and variances between different clones of HIV sequences collected from the subjects of the ALIVE study.

### Methods and Results

#### Part 1

• Maneuver back to the Biology Workbench website.
• Download the "visit_1_S1_S9.txt" and "visit_1_S10_S15.txt" files from the Week 4 Assignment Page
• Upload both sheets to the Biology WorkBench website.
• Select "Add Nucleic Sequences" and hit Run
• Hit "Run" from the resulting page
• Generate a multiple sequence alignment and distance tree for 12 of these sequences (3 clones from each of 4 subjects)
Table 1. Subjects and clone numbers used in the distance tree.
• Select 12 sequences. Select ClustalW Multiple Sequence Alignment from the scrolling menu and select "Run"
• On the next page, make sure the tree is set to "Unrooted Trees" and hit Submit
• Clones from each subject are clustered together and separate from the other clones.
• Clones from subject 3 are more genetically diverse than clones from other subjects as they have the greatest genetic distance from all other subjects. In contrast, clones from subject 2 are most closely related with subjects 4 and 5.
• Analyzing this tree makes it clear that clones associated with subject 3 have the greatest genetic distance from subject 4 and 5, and therefore shared a much older common ancestor than the clones that subject 2, subject 4, and subject 5 share. This means that the HIV sequences in subject 2, 4, and 5 are more closely related as their most common ancestor appears to be relatively close. We can therefore expect clones from subject 2, 4, and 5 to have more genetic similarities than clones from subject 3 have with any of the other clones in this analysis.

#### Part 2

• Select all of the clones for one and align them using the ClustalW tool as described earlier.
• Count the number of positions where at least one nucleotide is different across all of the clones (the number of columns with black font) to calculate S.
• Select "Import alignment" to save these alignments for the next step.
• Repeat this procedure for 2 more subjects.
• Calculate theta using Wolfram Alpha using the following formula where S is the same number you calculated in the previous step and n is the total number of clones for that subject:
   S/((1/n)+(1/(n-1))...(1/1))

• Switch to the "Alignment Tools" tab.
• Select one of the alignments and run the ClustalWDist alignment tool
• To calculate the min and max difference, find the lowest and highest pairwise values and multiply them with the length of the sequence
• The length of the sequence can be found by scrolling past the pairwise numbers and is reported as the number of base pairs in the clone. Round the number to the nearest integer.
Table 2. Data collected using ClustalW multiple alignment tool. The value S was calculated by counting the number of positions where at least one nucleotide is different across all clones, and theta is calculated using the following formula where S is the same number calculated earlier, and n is the total number of clones collected from that subject: S/((1/n)+(1/(n-1))...(1/1)). Min and max difference were calculated according to number of basepairs and pairwise numbers as presented by ClustalWDist
• Create new alignments with all of the sequences from 2 subjects using the ClustalW tool in the "Nucleic Tools" tab
• Use ClustWDist to generate another pairwise distance matrix and calculate the min and max differences as done in the previous steps.
Table 3. Data collected using ClustalWDist. Min and max differences are based on data from both of the subjects indicated in the first column and were calculated with the number of base pairs and the pairwise numbers as presented by ClustalWDist.

### Conclusion

Clones from subject 3 are most genetically distant from clones from subject 2, 4, and 5 according to the unrooted tree produced with ClustalW. This means that subject 3 shares a older common ancestor with subjects 2, 4, and 5 than the common ancestor that subjects 2, 4, and 5 share with each other. The ClustalWDist Alignment Tool revealed that clones from subject 3 are most similar with each other in comparison with clones from subjects 15 and 7. However, this data may be skewed because subject 3 only has 4 clones while subjects 15 and 7 have 12 and 10 clones respectively. It makes logical sense that subjects with more clones would have more differences between them, so despite the statistical differences between these clones, nothing can be definitely said regarding the similarities of each of those subjects' clones. When comparing sequences between three pairs of subjects, it was found that subjects 8 and 9 have the most similar sequences as illustrated with the max difference value of 33. The other two pairs of subjects, 2 and 3, and 4 and 5, the differences are 42 and 36 respectively. This is representative of the analysis of the unrooted tree with clones from subjects 2, 3, 4, and 5 because subject 2 and 3 are less genetically similar than subjects 4 and 5. Overall, this project provided information regarding the similarities in genetic code between several subjects, as confirmed with unrooted trees, S values, Theta values, and min and max differences. These analyses confirmed that subject 3 is most different from subjects 2, 4, and 5.

## Defining the Research Project

• Would a correlation analysis of sequence data from Markham et al. reveal a more informative trend between genetic diversity and decline in CD4 T cell counts than the categorical analysis that was done in the paper?
• We predict that a correlation analysis across several data points from several subjects will indicate that there is a consistent trend between HIV genetic diversity and CD4 T cell count.
3. Which subjects, visits, and clones will you use to answer your question?
• We will use data from subjects 6, 8, 9, and 14 because these subjects have the most data across the most number of visits through the trial. This means that these subjects were most compliant to the protocols outlined by the ALIVE study researchers and therefore may have the least number of non-project related influences on their health. We will use all clone data from all available visits to generate the correlation analysis, adding up to a total of 64 clones. If we find an analysis procedure that allows us to use more clones, we will continue to add subjects according to the number of visits they completed.

## Acknowledgements

Thank you Mia Huddleston for working with me on the procedures outlined above. Mia and I met online on Sunday afternoon to finalize details for our future research. Both Mia and I are in agreement on the details of our research, and wrote the section titled "Defining Our Research Project" together. While I worked with the people noted above, this individual journal entry was completed by me and not copied from another source. -- Anindita Varshneya 23:49, 25 September 2016 (EDT)

## References

1. Donovan S and Weisstein AE (2003) Exploring HIV Evolution: An Opportunity for Research. In Jungck JR, Fass MR, and Stanley ED, eds. Microbes Count! West Chester, Pennsylvania: Keystone Digital Press.
2. Markham, R.B., Wang, W.C., Weisstein, A.E., Wang, Z., Munoz, A., Templeton, A., Margolick, J., Vlahov, D., Quinn, T., Farzadegan, H., & Yu, X.F. (1998). Patterns of HIV-1 evolution in individuals with differing rates of CD4 T cell decline. Proc Natl Acad Sci U S A. 95, 12568-12573. doi: 10.1073/pnas.95.21.12568
3. Vlahov, D., Anthony, J.C., Munoz, A., Margolick, J., Nelson, K.E., Celentano, D.D., Solomon, L., Polk, B.F. (1991). The ALIVE study, a longitudinal study of HIV-1 infection in intravenous drug users: description of methods and characteristics of participants. NIDA Res Monogr 109, 75-100.
4. Week 4 Assignment Page