BIOL368/F14:Isabel Gonzaga Week 8

From OpenWetWare
Jump to: navigation, search

Defining Your HIV Structure Research Project

This research project will be completed in conjunction with Nicole Anguiano and Chloe Jones.


How does HIV status (diagnosed, progressing or non-trending) affect the structure of the V3 protein region?


We hypothesize that diagnosed groups will express greater variability in the V3 region in their protein structure, in comparison to the non-trending groups. Initial comparisons show that diagnosed groups and progressing groups expressed greater genetic variability than non-trending groups. These changes may affect the third variable region, affecting the host's ability to adapt to the changes and generate sufficient immune response.

Subject Data

According to the BEDROCK HIV Sequence Data Table, I was able to determine which of the subjects used within my study actually developed aids. All 3 AIDS diagnosed were confirmed with the disease by their final visit. In the AIDS progressing groups, subjects developed AIDS within 1 year after their final visit. The Non-Trending groups all maintained high CD4 T Cell Counts above the threshold, even after the study was conducted. Sequences were for each visit and subject were chosen using a Random Integer Generator, to eliminate selection bias.

The following sequences was taken from the BEDROCK HIV Problem Space Database, from the Markham et al. (1998) study.
Table 1: Sequences analyzed

Group Subject Visit Sequences
AIDS Diagnosed 3




1, 2, 4
3, 4, 5

3, 6, 7
2, 4, 8

2, 3, 4
5, 8, 10
AIDS Progressing 7




2, 3, 9
2, 8, 9

1, 4, 5
1, 6, 7

2, 3, 4
9, 10, 11
No Trend 5




1, 3, 12
4, 8, 9

3, 4, 10
6, 7, 9

1, 3, 8
3, 5, 7

Working with Protein Sequences In-class Activity

Source: Bioinformatics for Dummies pp. 110-123;
Analysis using the HIV gp120 envelope protein

Chapter 4: Reading a SWISS-Prot Entry

  • The Uniprot (previously known as SWISS-Prot) database was accessed
    • Searching HIV generates 600,415 results
    • Searching gp120 generates 182,286 results
  • Accession number "Q75760", which corresponds to the the HIV gp120 structure used for Huang et al. (2005), was searched for and accessed on the database
    • General Information about the entry
      • Entry name: Q75760_9HIV1
      • Primary Accession Number: Q75760
      • Integrated into UniProt: November 1, 1996
      • Sequence last modified: November 1, 1996
      • Annotations last Modified: October 1, 2014
    • Name and Origin of the Protein
      • Protein Name: Envelope glycoprotein gp160
      • Gene Name: env
      • From: Human immunodeficiency virus 1; Taxonomic identifier 11676
      • Taxonomy: Viruses › Retro-transcribing viruses › Retroviridae › Orthoretrovirinae › Lentivirus › Primate lentivirus group
    • Publications
      • 9 listed
    • The Cross-References
    • The Keywords
      • Apoptosis
      • Fusion of virus membrane with host membrane
      • Host-virus interaction
      • Viral attachment to host cell
      • Viral immunoevasion
      • Viral penetration into host cytoplasm
      • Virus entry into host cell
    • The Features
      • Glycosylation at residue 298 (1 aa long); n linked
      • 'Feature viewer' not active
    • The Sequence
      • Q75760 was downloaded in FASTA format

Chapter 5: ORFing your DNA sequence

Taken from Bioinformatics for Dummies, 2ed. pp 146-147

Figure 1. ExPasy Protscale using Eisenberg et al for envelop glycoprotein gp160

Chapter 6: Working with a single protein sequence

Taken from Bioinformatics for Dummies, 2ed. pp. 159-195. Analyzed using HIV gp120 env protein, sequence from UniPort.

Predicting the main physico-chemical properties of a protein

  • ExPasy ProtParam was accessed
  • Accession number Q75760, which corresponds to the the HIV gp120 structure used for Huang et al. (2005), was inputted
  • Molecular Weight: 96160.4 Da (theoretical)
  • Extinction Coefficients:
    • Assuming all Cys form cystines: 184145 M1cm1
    • Assuming all Cys mols are reduced: 182770 M1cm1
  • Stability Index: 37.91 (stable)
  • Half-Life:
    • 30 hours (mammalian reticulocytes, in vitro)
    • >20 hours (yeast, in vivo)
    • >10 hours (Escherichia coli, in vivo).
Figure 2. Composition of glycoprotein gp160 from ExPasy ProtParam

Digesting a protein in a computer

  • ExPasy Peptide Cutter tool was accessed and Q75760 accession number was inputted
  • The following enzymes were selected (arbitrarily) for cleavage: Arg-C proteinase, Pepsin (pH1.3), Proteinase K)

Doing primary structure analysis

Looking for transmembrane segments
  • ExPasy Protscale was accessed and Q75760 accession number was inputted
  • Hphob. / Kyte & Doolittle selected, window size set to 19
  • Results were converted to .gif format
    • 4 transmembrane domains identified
  • Repeated Protscale analysis using Eisenberg et al. to confirm findings
  • TMHMM page of CBS site accessed, and Q75760 sequence was inputted in FASTA format
    • 5 transmembrane domains observed
Figure 3. ExPasy Protscale using Kyte Doolitle for envelope glycoprotein gp160
Figure 4. ExPasy Protscale using Eisenberg et al for envelope glycoprotein gp160
Figure 5. TMHMM diagram for envelope glycoprotein gp160 indicates 5 transmembrane domains
Looking for coiled-coil regions
  • COILS server at EMBnet accessed and Q75760 sequence ID inputted
  • Coil-coil regions determined at ~5 places (largest between 600-700 aa residues)
Figure 6. Coil-coil regions for envelope glycoprotein gp160

Predicting post-translational modifications in your protein

  • ScanProsite accessed and Q75760 Accession Number inputted
    • Unchecked exclude motifs with a high probability of occurrence box; checked do not scan profiles
Figure 7. Scan Prosite proposed structure pattern for envelope glycoprotein gp160
Figure 8. Scan Prosite glycosylation patterns within glycoprotein gp160. Glycosylation was the pattern with highest probability.

Finding Known Domains in your Protein


InterProScan was accessed and the FASTA sequence for glycoprotein gp160 was entered in the text box. Run was submitted.

Figure 9. Domain determined for glycoprotein gp160 using InterProScan, based off all available domain databases.

NCBI CD Server was accessed and Q75760 accession number was inputted. Expect Value Threshold set to 1 and 'Submit Query' button clicked.

Figure 10. Domains determined for glycoprotein gp160 using CD Server. Red denotes SMART domains, and ragged ends indicate partial matches. Low e-values for both partial domains indicate significance of match

Motif-Scan was not accessible via the provided website

Chapter 11: Working with Protein 3D Structures

Taken from Bioinformatics for Dummies, 2ed. pp. 330-336.

Predicting the secondary structure of a protein sequence: PsiPred

PsiPred was accessed. FASTA amino acid sequence for the Q75760 accession number was inputted.

Figure 11. Secondary structure prediction for glycoprotein gp160 by PsiPred. The predominant predicted secondary structure is alpha helix.

Predicting additional structural features: PredictProtein

PredictProtein was accessed. A personal account was created and validated. The FASTA amino acid sequence for Q75760 accession number was inputted.

Figure 12. Protein structure prediction from PredictProtein for glycoprotein gp160.

Crystal Structure Comparison

CN3D file was downloaded for the Hunag et al. (2005) crystallized protein from the NCBI website. Alpha helices were colored in green and beta sheets were colored yellow, using the coloring shortcuts. The true structure of the protein shows it to be predominantly composed of beta sheets of random coil, despite the prediction that alpha helices would be more prevalent (according to PsiPred).

Figure 13. CN3D crystal structure determined by Huang et al. (2005) for structure 2B4C, color coded by secondary structure.

Weekly Assignments

Class Journals

Electronic Lab Notebook