Robert W Arnold Week 3

From OpenWetWare
Jump to navigationJump to search

Electronic Lab Notebook

Robert W Arnold

Week 3 Assignment

HIV Evolution Project


  • Created Biology WorkBench account.
  • Searched and found Markham article on NCBIwebsite.
  • Clicked on nucleotide tab, found the subject 4, visit 2, 4th clone. Located in GenBank as: AF016818.2.
  • Added AF016818.2 into WorkBench and examined sequence. Known as Seq 1.
  • Uploaded 4 other sequences known as Seq 2-5 from NCBI.
  • Used ClistalW tool to examine sequences and compare them. Left all settings as default.
  • All sequences 285 basepairs long except Sequence 3 which consists of 282 basepairs.
  • Sequences somehow got messed up on WorkBench during the examination. Here are accession numbers for each sequence and the numbers they are now in the ClistalW page:
    • Sequence 1 - AF016818.2 - Sequence 1 on ClistalW results
    • Sequence 2 - AF016768.2 - Sequence 5 on ClistalW results
    • Sequence 3 - AF089153.2 - Sequence 4 on ClistalW results
    • Sequence 4 - AF016767.2 - Sequence 3 on ClistalW results
    • Sequence 5 - AF089142.1 - Sequence 2 on ClistalW results
  • Here are the pairwise alignments for the sequences:
    • Sequences (1:2) Aligned. Score: 84
    • Sequences (1:3) Aligned. Score: 96
    • Sequences (1:4) Aligned. Score: 91
    • Sequences (1:5) Aligned. Score: 97
    • Sequences (2:3) Aligned. Score: 83
    • Sequences (2:4) Aligned. Score: 83
    • Sequences (2:5) Aligned. Score: 85
    • Sequences (3:4) Aligned. Score: 90
    • Sequences (3:5) Aligned. Score: 97
    • Sequences (4:5) Aligned. Score: 91
    • Time for pairwise alignment: 0.053411
  • When taken ClistalW's reassignment of numbers is taken into account the list should read as follows
    • Sequences (1:5) Aligned. Score: 84
    • Sequences (1:4) Aligned. Score: 96
    • Sequences (1:3) Aligned. Score: 91
    • Sequences (1:2) Aligned. Score: 97
    • Sequences (5:4) Aligned. Score: 83
    • Sequences (5:3) Aligned. Score: 83
    • Sequences (5:2) Aligned. Score: 85
    • Sequences (4:3) Aligned. Score: 90
    • Sequences (4:2) Aligned. Score: 97
    • Sequences (3:2) Aligned. Score: 91
  • This list renamed the sequences to how they are currently listed in my database. I believe they got shifted out of order due to an edit I had to do making the order turn into Seq 1, 5, 4, 3, 2.
  • When the renewed sequences are examined compared to the unrooted tree, it is easy to see that the higher the score, the more closely related the two strands are. The further they are, the less related. Sequences 1 and 2 and 2 and 4 both shared the highest scores of 97 white sequences 5 and 4 and 5 and 3 shared the lowest at 83.
  • Here is a picture of the unrooted tree. unrooted tree
  • Began Activity 2 by uploading both S1_S9 and S10_S15 sequences.
  • Subjects picked for preliminary analysis
    • S1V1-3
    • S1V1-5
    • S1V1-8
    • S5V1-2
    • S5V1-4
    • S5V1-8
    • S11V1-2
    • S11V1-3
    • S11V1-5
    • S14V1-1
    • S14V1-2
    • S14V1-5
  • Here is a picture of the unrooted tree from the 12 samples used.

unrooted tree 2

  • The similarities between sequences from the same subjects can clearly be differentiated from sequences of others due to the length between nodes. The highest scores here were multiple 99s with the lowest being an 82.
  • For part 2, all of the seven clones from S10 were chose. S was equal to 7. S10 consisted of 285 bps.
  • The second subject was S11 and all seven of its clones. S was equal to 10. S11 consisted of 288 bps.
  • The third subject chose was S2 and all six of its clone. S was equal to 5. S2 consisted of 285 bps.
  • Θ for S10 was 2.86.
  • Θ for S11 was 4.08.
  • Θ for S2 was 2.19.
  • The min and max difference for S10 was 0.004 and 0.011. For S11, it was 0.003 and 0.024. For S2, it was also 0.004 and 0.011.
  • The raw number of differences for S10, S11, and S2 were 2.00, 6.05, and 3.00 respectively.
  • Excel graph can be found here. File:Excel spreadhiv1.xls
  • Currently extremely frustrated with this program. Cannot get the select 2 subjects to align. No idea why.
  • Continuing work on Activity 2, Part 2.
  • For comparison alignments, I used subjects 2, 10, and 11.
  • Here is the excel graph for the comparison of the HIV subjects. File:Excel file 2.xls
  • Here is the Clustal Distance graph for subjects 11 and 10. CDist11and10
  • Here is the Clustal Distance graph for subjects 10 and 2. CDist10and2
  • Here is the Clustal Distance graph for subjects 11 and 2. CDist11and2
  • New software can be difficult to use at times and it can also be buggy. I had to deal with multiple resets of Biology Workbench while putting together this journal but I did find the program to be extremely useful. It is amazing to me on how much data can all be stored in one place and retrieved so efficiently


  • Activity 1
    1. The accession number of the first sequence chosen was AF016818.
    2. The sample was collected from subject 2 from the USA. It was taken from the envelope glycoprotein V3 region gene.
  • Activity 2
    1. Yes, the clones from each subject cluster together due to similarity of sequence. In the genetic trees, the closer the sequences are related, the closer the proximity between them.
    2. Yes, some clones do show some more diversity between each other. This can be seen by just viewing the diagram and looking at the distance between S11V1-5 to S11V1-2 as compared to S1V1-3 and S1V1-8. The clones in S11 are relatively far apart showing more genetic diversity as opposed to S1 where they are almost on top of each other.
    3. My tree involved 4 individual subjects, S1, S5, S11, and S14 with 3 clones taken from each. Due to the fact that these are obviously clones, one would expect them to be clustered up due to the genetic similarity. While this is primarily the case, in some instances these clones tend to be turning away and branching away from each other at a faster pace then most. This is most likely due to genetic mutation during replication. As can be seen in the graph, S5 and its clones are virtually on the same path without any deviation. However, in samples like the ones from S11, the deviation can already be seen in the breaking away of clone 2 and clone 5. These 2 clones have already broken far enough away from each other and are significantly different genetically, for clones anyway. As the genes continue to reproduce, more mutations are likely to happen and drive them away from each other even more. Going back to S5, it is unlikely that all the clones will stay that genetically similar and given time, it would be expected that they pitchfork out similar to S11.

Week 4 Journal Club

Word Definitions

  1. seroconversion - the change of a serologic test from negative to positive, showing a development of antibodies in response to infection of immunization
  2. PCR - polymerase chain reaction, The first practical system for in vitro amplification of dNA and as such one of the most important recent developments in molecular biology
  3. cohort - a group of animals of the same species, identified by a common characteristic, which are studied over a period of time as part of a scientific or medical investigation
  4. heterogenous - A mixture that is not uniform in its composition; the components can be visibly distinguished
  5. plasma - Fluid through which cellular components of blood, lymph, or intramuscular fluid are suspended
  6. variant - Something which differs in form from another thing, though really the same
  7. divergence - A moving or spreading apart or in different directions
  8. homogeneous - Consisting of or composed of similar elements or ingredients, of a uniform quality throughout
  9. CD4 T Cell - A form of T lymphocyte with CD4 receptor on the cell surface that recognizes antigens of a virus-infected cell
  10. epitope - that part of an antigenic molecule to which the t-cell receptor responds, a site on a large molecule against which an antibody will be produced and to which it will bind #*Biology Online September 20 2011

Journal Club Outline – Markham


  • HIV-1 env sequence evolution studied if 15 subjects with differences in CD4 T cell decline
    • Rate of increase higher in progressors than nonprogressors
    • Nonprogressors with low viral loads selected against nonsynonymous mutations that may have resulted in viruses with higher replication speed
    • Progressors selected for nonsynonymous mutation
    • 10 of 15 subjects didn’t show dominance of a single variant
    • Showed evolutionary patterns selecting against predominant virus or evolution in different environments in host
    • The amount of CD4 T cell decline over time in the evolution of HIV—1 env show both a difference in the amount of mutations previously occurred in the virus and a difference in the type of mutation that is most fit to survive in the surroundings provided by the host.
    • Environment - stable
    1. HIV – 1 have a high mutation and replication rate allowing for adaptation to host
    2. In a stable environment, most fit virus will become predominant; other mutations are not well represented
    3. E. coli, another strain with high replication has shown a similar ability to have one certain virus strain dominate in a stable environment
  • Environment – unstable
    1. Could have effects on gene pool; ex. Dynamic host immune response or for HIV – 1, differential coreceptors
    2. If the catalyst of the unstabilizing force is powerful but indiscriminate, likely only a few or the more abundant variants survive
    3. If immune responses target most abundant variants, there will be a reduction in total viral load without losing viral diversity
      1. These variants will continue with mutation allowing for more diversity eventually mutating into something unaffected by cell immune response
  • Host environment and stability/lack of stability allows for conclusions to be drawn about the type of forces influencing HIV – 1 evolution mutation patterns
  • Older studies involved a small amount of infected subjects and did not directly examine sequence patterns directly
  • This experiment followed 15 subjects over frequent intervals over a time period of 4 years
  • The experiment showed a difference between nonprogessors and progressors in their mutation selection pattern and that a higher level of genetic diversity correlates to a faster decline of CD4 T cells.


  • Study Population
  1. 15 subjects from a group of injection drug users in ALIVE study in Baltimore, MD
  2. Followed subjects and 6-month intervals, blood was taken for virologic and immunologic studies
  3. Subjects taken at HIV – 1 seroconversion, had different CD4 T levels
    1. Rapid progressors – fewer than 200 CD4 T cells within 2 years of seroconversion
    2. Moderate progressors – 200-650 CD4 T cells over 4 year period
    3. Nonprogressors – kept CD4 T cells over 650 over whole observation period
  • HIV – 1 env Gene Sequencing
  1. PCR used to replicate a 285-bp sequence from env gene from peripheral blood mononuclear cells (PBMC)
  2. Two external env primers and two nested primers used
  3. Amplified sequence of nested PCR cloned in pUC19
  4. Single round PCR with limiting dilution used to screen for viral DNA copy number
  5. Most or all clones as result of PCR were derived from unique viral genome template
  • Plasma viral load determined by reverse transcription PCR
  • Generation of Phylogenetic Trees
  1. MEGA computer package used to construct trees and Tamura-Nei distance measure, correcting for base composition and transition/transversion bias
  • Correlation Analysis
  1. Correlation between genetic diversity and CD4 T cell count 1 year later determined with formula using 76 time points of 15 subjects
  2. Used to determine how diversity is related to different CD4 T cell counts for future reference in subjects that began at similar CD4 T cell counts
  • Determination of dS/dN Ratio
  1. Each sequence computed and compared with others, each difference either synonymous or nonsynonymous
  2. Results were corrected for bias on unequal sample size and totally random mutations
  • Exam. Of Greater Initial Visit Diversity in Subjects 9 and 15
  1. Thought that each may have been infected with 2 different viruses due to high diversity
  2. They were grouped as monophyletic viruses
  3. All subjects were HIV – 1 seronegative up to 7 months before first visit
  • Comparison of Rate of Change of Divergence and Diversity
  1. Each individual was fit with a regression line of divergence/diversity
  2. Average of the slopes fro the 3 groups were according to progression determined by decline of CD4 T cell counts


  • CD4 decline varied over subjects
    1. Median annual change in subjects’ CD4 T cell number ranged from increase of 53 cells a year to decrease of 593 cells per year
    2. Nonprogressor group had low viral load at early points
    3. Limited viral load was not able to determine the moderate from rapid progressors
  • Figure 1 – Patterns of HIV – 1 evolution in individuals with differing rates of CD4 T cell decline
    • These results were gathered by record were collected over intervals of 4 years and were tested for CD4 T cell levels, diversity of nucleotide difference between intravisit clones, and percentage of nucleotides that have changed compared to original sequence.
    • Shows the CD4 T cell count in each of the 15 subjects from first seropositive visit to end of 4 year observation period. Individual subjects grouped into rapid, moderate, and nonprogressors. Solid dashes on left vertical line indicate the 200 and 650 CD4 T cell level counts to show this grouping. In each box there is a dotted line representing CD4 T cell count, a diamond line representing diversity, and a square line showing divergence. In the diamond/diversity line, the axis values show nucleotide differences between intravisit clones. In the divergence/square line, the values show the percent of nucleotides that have changed from original postseroconversion sequence.
    • High initital CD4 T cell count does not necessarily correlate to ending with a high CD4 T cell count. The three subjects with the highest initial CD4 T cell count (3, 4, 10) all ended up as rapid progressors while subject 6, who started with the lowest CD4 T cell count, ended up with an upward trajectory on in cell count. Higher genetic diversity directly resulted in most cases to a lower CD4 T cell count. The one exception to that rule is subject 6. The divergence square line showed the percent of nucleotides that had changed from the original postseroconversion sequence. Higher divergence levels correlated to lowed CD4 T cell levels in most all subjects.

All subjects except the nonprogressors and subject 6 showed a rate of annual decline for CD4 cells.

  • Table 1 – Summary data of 15 Seroconverters
    1. Rapid Progressors – Subjects 4, 10, 11, 15, 3, 1 with baseline CD4 cell counts of 1,028, 833, 753, 707, 819, 464 respectively
    2. Moderate Progressors – Subjects 7, 8, 14, 5, 9, 6 with baseline CD4 cell counts of 1,072, 538, 523, 749, 489, 405 respectively
    3. Nonprogressors – Subjects 2, 12, 13 with baseline CD4 cell counts of 714, 772, 671 respectively
    4. Total number of observations for each subject varied from 3 t0 9 times
    5. Majority of the median intravisit nucleotide differences in clones varied between 0.87 and 2.82
    6. Table 1 primarily showed the hard numbers that were seen in Figure 1 along with the dS/Dn ratio
  • A total of 873 clones were examined
  • HIV – 1 changes were determined by 2 factors
    1. Genetic diversity or nucleotide differences between clones
    2. Divergance or percent of different nucleotides in clones
  • In all subjects, diversity and divergence increased over observed period most likely do to more time for mutation to occur
  • Difference between rapid progressors and moderate progressors in divergence was negligible while the difference between rapid and nonprogressors was significant with a P of less than 0.001
  • For diversity, there was a significant difference between nonprogressors and moderate
  • Figure 2 shows the difference in diversity and divergence of the three progressive groups against original postseroconversion sequence
    1. Nonprogressors had the smallest amount of diversity and divergence, moderate progressors were in the middle, and rapid progressors had the greatest amount in both categories
    2. Rapid progressors also had the greatest range in both categories
  • Those with “greater genetic diversity or divergence at one visit were likely to experience greater CD 4 T cell decline over next year”
  • The dS/dN equals around 1 in the case of random mating
  • When dS/dN is below 1, it shows a higher percentage or a selection for nonsynonymous mutation
  • Nonprogressors had a ratio above 1 showing they selected for synonymous mutation
  • Overall dS for each group was similar, with the main difference in each ration being the dN level and selection for or against nonsynonymous mutations
  • Figure 3 focused in on subject 9 and how different strains took over dominance at different periods of time
  • Figure 4 expands on Figure 3 showing how there is no set dominance of a viral strain for an extended period of time
  • Single branches are often quickly branched off by other mutations creating a new, temporary, dominant strand of virus


  • This research showed that the more genetically diverse the virus is, the greater the decline of CD4 T cells over then next year
  • Comparable dS levels were seen across all subjects, with the difference being the dN levels
  • These results were inconsistent with 2 recent studies
    1. McDonald et al.
      • Study comparing gene diversity between rapid and slow progressors at two points 30 months apart
      • All subjects originally below 400 CD4 T cells
      • Rapid progressors studied in McDonald showed more genetic divergence like Markham, but also showed less intravisit diversity than slow progressors at the 30 month mark
      • Results likely vary because subjects studied in McDonald were not studied from seroconversion, also they were viewed much less
    2. Wolinsky at al.
      • Study of 6 adult subjects
      • Similar analysis to Markham, 2 highest rapid progressors showed less genetic diversity than those with slower CD4 T cell decline
      • 5 of 6 rapid progressors in Markham “showed high diversity and divergence” with subject 11 being the exception and acting like Wolinsky’s 2 highest progressors
      • Possible that these 3 may all be special cases of subjects that were unable to have any response to the virus in their system, which would allot for a stable environment for a single strain to become dominant
      • Wolinsky failed to determine if there even was an immune response for his 2 highest progressors
  • Nowak’s model is consistent for the majority of the subjects
  • The higher genetic diversity and divergence leads to lower CD4 T cell count
  • Nowak contends this is because the immune response system is unable to respond to the diversity of clones
  • 6 of the 15 subjects developed AIDS during the study
  • Overall, the difference between all of the strains in the subjects studied were there affinity for or against selection for nonsynonymous changes.


Biol 368 Homepage