AninditaVarshneya BIOL368 Week 6

From OpenWetWare
Jump to navigationJump to search

Electronic Lab Notebook


Complete a correlation analysis comparing CD4 T cell count and genetic diversity. We predict that there will be a positive correlation between CD4 and genetic diversity.

Methods and Results

  • Created spreadsheet with different sheet for each of the four subjects (6, 8, 9, and 14)
    • Each sheet has the headings: Visit Number, Number of Clones, S, and Theta
  • For the sake of efficiency, Colin, Will, Mia, and I are each calculated all necessary values for one subject. I am calculating data for subject 6.
  • Go to the Markham et al. Sequence Data and download all of the data associated with subject 6 as a .txt file.
  • Open Biology Workbench and add all of the sequence data by selecting Nucleic Tools, and running "Add New Nucleic Sequence"
  • Open the file containing all of the sequence data for subject 6 and select "Upload File"
  • Select Save from the menu on the top of the screen
  • Select all of the sequences from a single visit and run "ClustalW Multiple Sequence Alignment"
  • Count the number of genetic difference between all of the sequences to calculate "S"
  • Calculate theta using the following formula:
  • Repeat this process for all of the other sequences at each of the other visits.
  • Once the sequence data was collected, add the corresponding CD4 T cell count data and visit dates from the original Sequence Data presented from the Markham et al. paper to a spreadsheet alongside the calculated data.
  • Create plot graphs correlating CD4 counts against S values and CD4 counts against theta values.
    • Add regression curves and R^2 values.
AV 20161009 S value vs cd4.PNG
AV 20161009 Theta vs cd4.PNG
  • Create two other plots with two y axes comparing CD4 counts, time, and S values, and CD4 counts, time, and theta values.
AV 20161009 Data over time.PNG
  • Select all data associated with a single patient and select ClustalW Multiple Sequence Alignment.
    • Select "Rooted Tree" and submit sequence data.


Our analysis found that there is no correlation between genetic diversity, as determined through S values and theta values, and CD4 T cell count. The graphs produced indicated a complete lack of correlation between either S values or theta values. S values represent the number of nucleotide differences between genetic sequences, and theta values represent the pairwise genetic distance between genetic sequences. This lack of correlation could be due to a lack of complete data. Because our project only used data from the 4 most consistent subjects in the study, we are limited in the number of data points used. If more data points were used across more patients, a more determinate conclusion could be made regarding the correlation, or lack thereof, between genetic diversity and CD4 counts. Despite the lack of definite correlation, the beginning of a trend between genetic diversity and CD4 counts over time were observed. Though this trend is not conclusive, it provides some clue that with more data points a correlation could potentially exist.

Data and Files


The PDF version of our presentation can be found here.


Mia Huddleston, Colin Wikholm, Will Fuchs, and worked together in class to collect statistical data from the sequence data. Mia and I met additionally on Sunday night to complete and practice our presentation. While I worked with the people noted above, this individual journal entry was completed by me and not copied from another source.


  1. Donovan S and Weisstein AE (2003) Exploring HIV Evolution: An Opportunity for Research. In Jungck JR, Fass MR, and Stanley ED, eds. Microbes Count! West Chester, Pennsylvania: Keystone Digital Press.
  2. Markham, R.B., Wang, W.C., Weisstein, A.E., Wang, Z., Munoz, A., Templeton, A., Margolick, J., Vlahov, D., Quinn, T., Farzadegan, H., & Yu, X.F. (1998). Patterns of HIV-1 evolution in individuals with differing rates of CD4 T cell decline. Proc Natl Acad Sci U S A. 95, 12568-12573. doi: 10.1073/pnas.95.21.12568
  3. Vlahov, D., Anthony, J.C., Munoz, A., Margolick, J., Nelson, K.E., Celentano, D.D., Solomon, L., Polk, B.F. (1991). The ALIVE study, a longitudinal study of HIV-1 infection in intravenous drug users: description of methods and characteristics of participants. NIDA Res Monogr 109, 75-100.
  4. James, M. M., Wang, L., Musoke, P., Donnell, D., Fogel, J., Towler, W. I., ... & Eshleman, S. H. (2011). Association of HIV diversity and survival in HIV-infected Ugandan infants. PLoS One, 6(4), e18642. DOI: 10.1371/journal.pone.0018642
  5. Araújo, L. A. L., & Almeida, S. E. (2013). HIV-1 diversity in the envelope glycoproteins: implications for viral entry inhibition. Viruses, 5(2), 595-604. DOI: doi:10.3390/v5020595
  6. Rachinger, A., Kootstra, N. A., Gijsbers, E. F., van den Kerkhof, T. L., Schuitemaker, H., & Van‘t Wout, A. B. (2012). HIV-1 envelope diversity 1 year after seroconversion predicts subsequent disease progression. AIDS, 26(12), 1517-1522. DOI: 10.1097/QAD.0b013e328354f539
  7. Week 6 Assignment Page

Other Links

User Page: Anindita Varshneya

Bioinfomatics Lab: Fall 2016

Class Page: BIOL 368-01: Bioinfomatics Laboratory, Fall 2016

Weekly Assignments Individual Journal Assignments Shared Journal Assignments

SURP 2015

Links: Electronic Lab Notebook