BIOL368/F14:Isabel Gonzaga Week 9

From OpenWetWare
Revision as of 00:01, 29 October 2014 by Isabel Gonzaga (talk | contribs) (fixed table)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

HIV Structure Project

Defining Your HIV Structure Research Project

This research project will be completed in conjunction with Nicole Anguiano and Chloe Jones.

Question

How does the structure of the V3 protein region affect the HIV status (diagnosed, progressing or non-trending) of the patient?

Hypothesis

We hypothesize that diagnosed groups will express greater variability in the V3 region in their protein structure, in comparison to the non-trending groups. Initial comparisons show that diagnosed groups and progressing groups expressed greater genetic variability than non-trending groups. These changes may affect the third variable region, affecting the host's ability to adapt to the changes and generate sufficient immune response.

Subject Data

According to the BEDROCK HIV Sequence Data Table, I was able to determine which of the subjects used within my study actually developed aids. All 3 AIDS diagnosed were confirmed with the disease by their final visit. In the AIDS progressing groups, subjects developed AIDS within 1 year after their final visit. The Non-Trending groups all maintained high CD4 T Cell Counts above the threshold, even after the study was conducted. Sequences were for each visit and subject were chosen using a Random Integer Generator, to eliminate selection bias.


The following sequences was taken from the BEDROCK HIV Problem Space Database, from the Markham et al. (1998) study.
Table 1: Sequences analyzed

Group Subject Visit Sequences
AIDS Diagnosed 3


10


15
1
6

1
6

1
4
1, 2, 4
3, 4, 5

3, 6, 7
2, 4, 8

2, 3, 4
5, 8, 10
AIDS Progressing 7


8


14
1
5

1
7

1
9
2, 3, 9
2, 8, 9

1, 4, 5
1, 6, 7

2, 3, 4
9, 10, 11
No Trend 5


6


13
1
5

1
9

1
5
1, 3, 8
4, 5, 2

1, 2, 3
6, 7, 9

1, 3, 4
3, 5, 4



Protein sequences for each data set were taken from BEDROCK HIV Problem Space and converted to the following .txt files using word processor programs:

DNA sequences for each data were also taken from BEDROCK HIV Problem Space and converted to the following .txt files: [[Media:|AIDS Diagnosed Sequences]], AIDS Progressing Sequences, [[Media:|No Trend Sequences]].

Protein Sequence Multiple Sequence Alignment

ClustalW was performed for each group category from Visit 1 under the Biology Workbench Protein tools. Rootless phylogenetic trees were analyzed, and the multiple sequence alignment was conducted. This alignment was used to determine diversity within each category at each amino acid residue. ClustalW was also performed for each group category from the final visit.

Multiple Sequence alignment was also used to compare differences between amino acid sequences and DNA sequences. The DNA sequences for each clone was uploaded onto Biology Workbench under the 'nucleic tools' tab. ClustalW was performed for each group category three times: for visit 1, final visit and both visits combined.

There seem to be fewer differences between the amino acid sequences compared to the DNA sequences. This is likely due to the redundancy of the degenerate genetic code (ie. different combinations of DNA sequences form different codons that code for the same amino acid residue).

Protein sequence alignments were downloaded as a .txt file for further analysis. The alignments can be found here:

Diagnosed Group Multiple Sequence Alignments

Figure 1. Multiple Sequence alignment comparing all AIDS diagnosed sequences at the first visit. Blue asterisks (*) denote single, fully conserved residues. Green colons (:) denote conservation of strong groups, and Dark Blue periods (.) denote conservation of weak groups. Black (no symbol) indicates no consensus.
Tree for the visit 1 clones of the AIDS diagnosed patients
Figure 2: Tree for the AIDS diagnosed patients at the first visit. Each clone is most closely related to clones from its own species, as expected.
Figure 3. Multiple Sequence alignment comparing all AIDS diagnosed sequences at the final visit. Blue asterisks (*) denote single, fully conserved residues. Green colons (:) denote conservation of strong groups, and Dark Blue periods (.) denote conservation of weak groups. Black (no symbol) indicates no consensus.
Figure 4. Tree for the AIDS diagnosed patients at the first visit. Each clone is most closely related to clones from its own species, as expected.
Figure 5. Multiple Sequence alignment comparing all AIDS diagnosed sequences. Blue asterisks (*) denote single, fully conserved residues. Green colons (:) denote conservation of strong groups, and Dark Blue periods (.) denote conservation of weak groups. Black (no symbol) indicates no consensus.
Figure 6. Tree for the AIDS diagnosed patients at the first visit. Each clone is most closely related to clones from its own species, as expected.


Progressing Group Multiple Sequence Alignments

Figure 7. Multiple Sequence alignment comparing all progressing amino acid sequences at the first visit. Blue asterisks (*) denote single, fully conserved residues. Green colons (:) denote conservation of strong groups, and Dark Blue periods (.) denote conservation of weak groups. Black (no symbol) indicates no consensus. For all first visit progressing sequences, 9 residues had no consensus, 17 strong groups were conserved and 9 weak groups were conserved.
Figure 8. Phylogenetic tree generated for all AIDS progressing sequences at the first visit. Clones from the same virus were more genetically similar to each other, than to clones from other subjects.
Figure 9. Multiple Sequence alignment comparing all progressing amino acid sequences at the final visit. Blue asterisks (*) denote single, fully conserved residues. Green colons (:) denote conservation of strong groups, and Dark Blue periods (.) denote conservation of weak groups. Black (no symbol) indicates no consensus. For all final visit progressing sequences, 12 residues had no consensus, 24 strong groups were conserved and 3 weak groups were conserved.
Figure 10. Phylogenetic tree generated for all AIDS progressing sequences at the final visit. Clones from the same virus were more genetically similar to each other, than to clones from other subjects.
Figure 11. Multiple Sequence alignment comparing all progressing amino sequences. Blue asterisks (*) denote single, fully conserved residues. Green colons (:) denote conservation of strong groups, and Dark Blue periods (.) denote conservation of weak groups. Black (no symbol) indicates no consensus. For all progressing sequences, 14 residues had no consensus, 23 strong groups were conserved and 7 weak groups were conserved.
Figure 12. Phylogenetic tree generated for all AIDS progressing sequences at the first and last visits. Clones from the same virus were still more genetically similar to each other, than to clones from other subjects.

Non-trending Group Multiple Sequence Alignments

Figure 13. Multiple Sequence alignment comparing all progressing sequences at the first visit.
Figure 14. Phylogenetic tree comparing all progressing sequences at the first visit.
Figure 15. Multiple Sequence alignment comparing all progressing sequences at the final visit.
Figure 16. Phylogenetic tree comparing all progressing sequences at the final visit.
Figure 17. Multiple Sequence alignment comparing all progressing sequences.
Figure 18. Phylogenetic tree comparing all progressing sequences.


DNA Sequence Multiple Sequence Alignment

AIDS Diagnosed Groups

Figure 19. Multiple Sequence alignment comparing all AIDS Diagnosed DNA sequences at the first visit. Blue asterisks (*) denote single, fully conserved residues. Black indicates no consensus. 69 total differences were observed between the DNA sequences.
Figure 20. Multiple Sequence alignment comparing all AIDS Diagnosed DNA sequences at the final visit. Blue asterisks (*) denote single, fully conserved residues. Black indicates no consensus. 79 total differences were observed between the DNA sequences.
File:AIDSDiagnosedSA DNA.png
Figure 21. Multiple Sequence alignment comparing all AIDS Diagnosed DNA sequences. Blue asterisks (*) denote single, fully conserved residues. Black indicates no consensus. 94 total differences were observed between the DNA sequences.
Visit Type Number of Differences Percentage Different
1 Amino Acids 37 37/95 = 38.9%
1 DNA 69 69/285 = 24.2%
Final Amino Acids 45 45/95 = 47.3%
Final DNA 79 79/285 = 27.7%
Both Amino Acids 49 49/95 = 51.6%
Both DNA 94 94/285 = 33%

Table 2. Table illustrating the number of differences between the amino acid sequences and the DNA sequences at both visits.

AIDS Progressing Groups

Figure 22. Multiple Sequence alignment comparing all progressing DNA sequences at the initial visit. Blue asterisks (*) denote single, fully conserved residues. Black indicates no consensus. 52 differences were observed between the DNA sequences.
Figure 23. Multiple Sequence alignment comparing all progressing DNA sequences at the final visit. Blue asterisks (*) denote single, fully conserved residues. Black indicates no consensus. 66 differences were observed between the DNA sequences.
Figure 24a. Part 1 of the multiple Sequence alignment comparing all progressing DNA sequences. Blue asterisks (*) denote single, fully conserved residues. Black indicates no consensus. 80 total differences were observed between the DNA sequences.
Figure 24b. Part 2 of the multiple Sequence alignment comparing all progressing DNA sequences. Blue asterisks (*) denote single, fully conserved residues. Black indicates no consensus. 80 total differences were observed between the DNA sequences.

Sequence Differences Between Progressor Groups

Visit Type Number of Differences Percentage Different
1 Amino Acids 31 31/95 = 32.6%
1 DNA 52 52/285 = 18.2%
Final Amino Acids 39 39/95 = 41%
Final DNA 66 66/285 = 23.2%
Both Amino Acids 44 44/95 = 46%
Both DNA 80 80/285 = 28.1%

Table 3. The calculated percentage differences of amino acid and DNA residues for Progressor groups at the initial visits, final visits, and combined groups. The data suggests a trend towards increasing diversity over time, within the progressor group. Amino acid sequences also show less consensus than DNA sequences.

Non Trending Groups

Visit Type Number of Differences Percentage Different
1 Amino Acids 36 36/95 = 37.9%
1 DNA 67 67/285 = 23.5%
Final Amino Acids 38 38/95 = 40%
Final DNA 67 67/285 = 23.5%
Both Amino Acids 43 43/95 = 45.3%
Both DNA 79 79/ 285=27.7%

Table 4. Showing the percent difference in the Protein Sequence and DNA Sequence for Subject 5,6,13 for the first visit, final visit, and all the visits.

Secondary Structure Prediction in V3 Fragment

PSIPRED Protein Sequence Analysis Workbench was accessed. The multiple sequence alignments created through ClustalW were uploaded to the page for analysis for each group at the initial visit, final visit, and then for each subject. The following results were generated by the program. Yellow indicates beta sheets, pink cylinders indicate helices, and the black indicates the coil. The varying levels of confidence by the program is indicated by the height of size and darkness of the blue bar above each sequence.

AIDS Diagnosed

PsiPred result for the Visit 1 Aids Diagnosed Patients.
Figure 25. PsiPred result for the first visit of all the AIDS diagnosed patients. The V3 region is located at 29-53. Like subject 15, it does not have a beta sheet at residues 47-49, likely due to the influence of subject 15.
PsiPred result for the Final Visit Aids Diagnosed Patients.
Figure 26. PsiPred result for the final visit of all the AIDS diagnosed patients. The V3 region is located at 29-53. Like subject 15, it does not have a beta sheet at residues 47-49, likely due to the influence of subject 15.
PsiPred result for Subject 3.
Figure 27. PsiPred result for both the first and last visits of subject 3. The V3 region is located at 29-53.
PsiPred result for Subject 10.
Figure 28. PsiPred result for both the first and last visits of subject 10. The V3 region is located at 29-53.
PsiPred result for Subject 15.
Figure 29. PsiPred result for both the first and last visits of subject 15. The V3 region is located at 29-53. This is the only subject in which there is not a beta sheet at residues 47-49.
  • With the exception of subject 15, each of the sequences matched the consensus sequence. Perhaps due to the influence of subject 15, the first and final visits results for all the sequences also are missing the beta sheet that subject 15 is missing.

AIDS Progressing

Figure 30. PSIPred secondary structure prediction for the Progressing groups at their initial visit. Beta sheets and alpha helices are predicted throughout the protein. The V3 region (residue 29-63) contains a 3 residue beta sheet (47-49), and a 6 residue alpha helix (56-61).
Figure 31. PSIPred secondary structure prediction for the Progressing groups at their final visit. Beta sheets and alpha helices are predicted throughout the protein. The V3 region (residue 29-63) contains a 3 residue beta sheet (47-49), and a 6 residue alpha helix (56-61). The final visit prediction is the same as the initial visit prediction.
Figure 32. PSIPred secondary structure prediction for subject 7 at their first and final visit. Beta sheets and alpha helices are predicted throughout the protein. The V3 region (residue 29-63) contains a 3 residue beta sheet (47-49), and a 6 residue alpha helix (56-61). The predicted structure is the same as all other progressing group predictions.
Figure 33. PSIPred secondary structure prediction for subject 8 at their first and final visit. Beta sheets and alpha helices are predicted throughout the protein. The V3 region (residue 29-63) contains a 3 residue beta sheet (47-49), and a 6 residue alpha helix (56-61). The predicted structure is the same as all other progressing group predictions.
Figure 34. PSIPred secondary structure prediction for subject 14 at their first and final visit. Beta sheets and alpha helices are predicted throughout the protein. The V3 region (residue 29-63) contains a 3 residue beta sheet (47-49), and a 6 residue alpha helix (56-61). The predicted structure is the same as all other progressing group predictions.

Non Trending

Figure 35. The results from PsiPred using the amino acids from all Subjects 5,6, and 13 for Visit 1.
Figure 36. The results from PsiPred using the amino acids from all Subjects 5,6, and 13 for last visit.
Figure 37. The results from PsiPred using the amino acids from Subject 5, for first and last visit
Figure 38. The results from PsiPred using the amino acids from Subject 6, for first and last visit)
Figure 39. The results from PsiPred using the amino acids from Subject 13, for first and last visit
Figure 40. The results from PsiPred using the amino acids from Subjects 5,6, and 13, for first and last visit


Analysis of V3 Structure

Huang et al. (2005) Structure 2B4C was uploaded onto StarBiochem. Images were developed by selecting various structural levels and adjusting size of atoms and groups. Using StarBiochem, the V3 region was isolated and the sequence was found.

Four polypeptide subunits were defined using the Quatenary structure methods:

  • Chain G (Yellow)
    • Residues: 84:G - 492:G
    • Amino end: Valine (position 84)
    • Carboxyl end: Glutamate (position 492)
  • Chain C (Light Pink)
    • Residues: 1:C - 175:C
    • Amino end: Lysine (position 1)
    • Carboxyl end: Valine (position 175)
  • Chain L (Green)
    • Residues: 1:L - 214:L
    • Amino end: Glutamate (position 1)
    • Carboxyl end: Cysteine (position 214)
  • Chain H (Dark Red)
    • Residues: 2:H - 216:H
    • Amino end: Glutamine (position 2)
    • Carboxyl end: Cysteine (position 216)
Figure 41. Amino and carboxyl termini of the gp120 protein polypeptide subunits depicted using StarBiochem.
Figure 42. The four polypeptide quatenary structures of the gp120 protein depicted using StarBiochem.

Secondary Structure Elements:

  • Beta Sheets: 22
  • Alpha Helices: 17
  • Random Coil: 84
Figure 43. Alpha helices (pink) and beta sheets (blue) as depicted as secondary structures within the gp120 protein using StarBiochem.

V3 Region

  • Located between 296:G and 331:G
  • Sequence as follows:
    • C T R P N Q N T R K S I H I G P G R A F Y T T G E I I G D I R Q A H C
  • V3 Amino Acid Properties
    • Nonpolar: 12
    • Polar: 11
    • Positively charged: 6
    • Negatively charged: 2
    • Aromatic: 2
Figure 44. Image of the isolated V3 region from the gp120 protein as imaged on StarBiochem. The region was determined between 296:G and 331:G. This region contains a beta sheet and alpha helix, similar to the previous PSIPreds for the other
Figure 45. Image of the V3 region in the context of the entire gp120 protein structure as depicted on StarBiochem.
Figure 46. V3 region sequence from 296:G to 331:G in the amino acid sequence viewer of StarBiochem.

The gp120 protein from Huang et al. was ran on PSIpred in order to predict secondary structures. The V3 region sequence was identified at residues 293 through 326. This was compared to the PSIPred's for the Markham et al. sequences. The amino acid for the Markham et al. sequences were identified from residues 29-63. In each of the PSIPred runs, a corresponding beta sheet and alpha helix exists, as expected. This shows that the proper portion of the Markham et al. sequences were identified as the V3 regions.

Figure 47. PSIPred of the Huang et al. gp120 protein. The V3 sequence region was identified from residues 293 to 326

Amino Acid Sequence Effects on V3

The amino acids of the Markham et al. sequences with non conserved residues within the V3 region (residues 29-63 of the 95 amino acid sequence) were analyzed for the progressing group

Visit Position (from start of V3 sequence) Conservation Residues
1 10 strongly conserved K, R
1 13 no conservation N, S, P, L
1 14 no conservation T, I
1 20 strong F
1 22 strong A, T
1 25 strong D, E
1 29 strong D, N
1 10 no conservation N, S, P, L
F 5 strong N, H
F 10 strong K, E
F 11 no conservation R, S
F 13 no conservation S, H, N
F 14 strong L, I
F 19 weak V, A
F 20 none Y, F, L
F 22 strong T, A
F 25 none Q, E, K, A
F 29 strong D, N
F 32 strong K, Q
F 34 strong Y, H
All 5 strong N, H
All 10 weak K, E
All 11 no conservation R, S
All 13 no conservation S, H, N, L, P
All 14 none L, I
All 19 weak V, A
All 20 none Y, F, L
All 22 strong T, A
All 25 none Q, E, K, A, D
All 29 strong D, N
All 32 strong K, Q
All 34 strong Y, H

Table 5: Sequences analyzed for progress or group changes at the V3 region.

Presentation

HIV Structure Project

Weekly Assignments

Class Journals

Electronic Lab Notebook