BIOL368/F14:Nicole Anguiano Week 8

From OpenWetWare
Jump to navigationJump to search

Defining Your HIV Structure Research Project

Project going to be worked on in conjugation with Isabel Gonzaga and Chloe Jones. The text below is taken from Isabel Gonzaga Week 8, but the project we are working on uses the same question, hypothesis, and subject data.


How does HIV status (diagnosed, progressing or non-trending) affect the structure of the V3 protein region?


We hypothesize that diagnosed groups will express greater variability in the V3 region in their protein structure, in comparison to the non-trending groups. Initial comparisons show that diagnosed groups and progressing groups expressed greater genetic variability than non-trending groups. These changes may affect the third variable region, affecting the host's ability to adapt to the changes and generate sufficient immune response.

Subject Data

According to the BEDROCK HIV Sequence Data Table, I was able to determine which of the subjects used within my study actually developed aids. All 3 AIDS diagnosed were confirmed with the disease by their final visit. In the AIDS progressing groups, subjects developed AIDS within 1 year after their final visit. The Non-Trending groups all maintained high CD4 T Cell Counts above the threshold, even after the study was conducted. Sequences were for each visit and subject were chosen using a Random Integer Generator, to eliminate selection bias.

The following sequences was taken from the BEDROCK HIV Problem Space Database, from the Markham et al. (1998) study.
Table 1: Sequences analyzed

Group Subject Visit Sequences
AIDS Diagnosed 3




1, 2, 4
3, 4, 5

3, 6, 7
2, 4, 8

2, 3, 4
5, 8, 10
AIDS Progressing 7




2, 3, 9
2, 8, 9

1, 4, 5
1, 6, 7

2, 3, 4
9, 10, 11
No Trend 5




1, 3, 8
4, 5, 2

1, 2, 3
6, 7, 9

1, 3, 4
3, 5, 4

Working with Protein Sequences In-class Activity

Reading a SWISS-PROT Entry

  • I navigated to UniProt. I searched for "Q75760". Here is a portion of the results from the protein that came up from the search.

Entry Information

Entry Name: Q75760_9HIV1
Primary (citable) accession number: Q75760
Integrated into UniProtKB/TrEMBL: November 1, 1996
Last sequence update: November 1, 1996
Last modified: October 1, 2014

Names & Taxonomy

Protein names: Envelope glycoprotein gp160
Gene names: env
Organism: Human immunodeficiency virus 1
Taxonomic identifier: 11676
Taxonomic lineage: Viruses › Retro-transcribing viruses › Retroviridae › Orthoretrovirinae › Lentivirus › Primate lentivirus group


The envelope glyprotein gp160 precursor down-modulates cell surface CD4 antigen by interacting with it in the endoplasmic reticulum and blocking its transport to the cell surface.
The gp120-gp41 heterodimer allows rapid transcytosis of the virus through CD4 negative cells such as simple epithelial monolayers of the intestinal, rectal and endocervical epithelial barriers. Both gp120 and gp41 specifically recognize glycosphingolipids galactosyl-ceramide (GalCer) or 3' sulfo-galactosyl-ceramide (GalS) present in the lipid rafts structures of epithelial cells. Binding to these alternative receptors allows the rapid transcytosis of the virus through the epithelial cells. This transcytotic vesicle-mediated transport of virions from the apical side to the basolateral side of the epithelial cells does not involve infection of the cells themselves.


Binary Interactions

With Entry #Exp IntAct Notes
P84801 2 EBI-8453491,EBI-8453570 From a different organism.
ath Q9KWN0 2 EBI-8453491,EBI-8453511 From a different organism.
UDA1 P11218 2 EBI-8453491,EBI-8453649 From a different organism.

Protein-protein interaction databases

Dip DIP-59960N.
IntAct Q75760. 3 interactions.
MINT MINT-8414778.

Subcellular Location

Virion membrane; Single-pass type I membrane protein. Host cell membrane; Single-pass type I membrane protein. Host endosome membrane; Single-pass type I membrane protein

PTM / Processing

Amino Acid modifications

Feature Key Position(s) Length Description
Glycosylation 298-298 1 N-linked (GlcNAc...)


Keywords - Technical term


  • This section contained a variety of references. There were sequence databases (EMBL, GenBank, DDBJ, PIR), 3D structure databases (PDBe, RCSB PDB, PDBj, ProteinModelPortal, SMR, ModBase, ModiDB), protein-protein interaction databases (DIP, IntAct, MINT), protocols and materials databases (Structural Biology Knowledgebase), miscellaneous databases (EvolutionaryTrace), and family and domain databases (Gene3D, InterPro, Pfam, SUBFAM, Protonet).


See table under PTM / Processing.


  1. If you search on the keywords "HIV" and "gp120", how many results do you get?
  • Searching "hiv" returns 600,415 results. Searching "gp120" returns 182,286 results. Searching "hiv AND gp120" returned 180,227 results.

ORFing your DNA sequence

'Subject 15, visit 4, subject 3, open reading frames
Figure 1: The six possible open reading frames for subject 15, visit 4, clone 3.
  • Comparing to the fasta sequence of the Uniprot protein above, I can see that the first open reading frame is most likely the first. The amino acid sequence, "EVVIRSENFTNNAKIIIVHLNESVVINCTRPNNNTRRKIPIGPGSSFYTTGIIGDIRQAHCNISGSKWNNTLKQIVNKLREQFVNKTIIFNQSS", is extremely similar to the sequence contained in the Uniprot protein, "EVVIRSDNFTNNAKTIIVQLKESVEINCTRPNNNTRKSIHIGPGRAFYTTGEIIGDIRQAHCNISRAKWNDTLKQIVIKLREQFENKTIVFNHSS". There are very few differences between them, indicating that likely the env gene is located in that location in the overall protein.

Working with a single protein sequence


  • I navigated to ProtParam, and inputted the sequence from the clone above, then selected "Compute Parameters". The result was as follows:

Number of amino acids: 94
Molecular weight: 10625.1
Theoretical pI: 10.14

Amino acid composition:

Ala (A) 2 2.1%
Arg (R) 6 6.4%
Asn (N) 14 14.9%
Asp (D) 1 1.1%
Cys (C) 2 2.1%
Gln (Q) 4 4.3%
Glu (E) 4 4.3%
Gly (G) 5 5.3%
His (H) 2 2.1%
Ile (I) 14 14.9%
Leu (L) 3 3.2%
Lys (K) 6 6.4%
Met (M) 0 0.0%
Phe (F) 4 4.3%
Pro (P) 3 3.2%
Ser (S) 8 8.5%
Thr (T) 7 7.4%
Trp (W) 1 1.1%
Tyr (Y) 1 1.1%
Val (V) 7 7.4%
Pyl (O) 0 0.0%
Sec (U) 0 0.0%
(B) 0 0.0%
(Z) 0 0.0%
(X) 0 0.0%

Total number of negatively charged residues (Asp + Glu): 5
Total number of positively charged residues (Arg + Lys): 12

Atomic composition:

Carbon C 466
Hydrogen H 759
Nitrogen N 141
Oxygen O 139
Sulfur S 2

Formula: C466H759N141O139S2
Total number of atoms: 13540

Extinction coefficients:
Extinction coefficients are in units of M-1 cm-1, at 280 nm measured in water.
Ext. coefficient 7115
Abs 0.1% (=1 g/l) 0.670, assuming all pairs of Cys residues form cystines
Ext. coefficient 6990
Abs 0.1% (=1 g/l) 0.658, assuming all Cys residues are reduced

Estimated half-life:
The N-terminal of the sequence considered is M (Met).
The estimated half-life is: 1 hours (mammalian reticulocytes, in vitro), 30 min (yeast, in vivo), >10 hours (Escherichia coli, in vivo).

Instability index:
The instability index (II) is computed to be 45.96
This classifies the protein as unstable.
Aliphatic index: 94.26
Grand average of hydropathicity (GRAVY): -0.362


  • I navigated to ProtScale and entered the amino acid sequence. I changed the "Window Size" dropdown to 19, then hit Submit. I saved the image as a .gif (Fig. 2).
Protscale result for subject 15, visit 4, clone 3's amino acid sequence.
Figure 2: The ProtScale result for subject 15, visit 4, clone 3's amino acid sequence..


  • Next, I navigated to TMHMM, pasted in the sequence, then hit submit, then saved the image (Fig. 3).
TMHMM result for subject 15, visit 4, clone 3's amino acid sequence.
Figure 3: The TMHMM result for subject 15, visit 4, clone 3's amino acid sequence. Note the lack of any visible lines.


  • I navigated to ScanProsite and inputted the amino acid sequence. I deselected "Exclude motifs with a high probability of occurrence from the scan", and then hit "START THE SCAN".
ScanProsite Result Part 1
Figure 4: The ScanProsite result showing the location of the sites on the amino acid sequence.
ScanProsite Result Part 2
Figure 5: The ScanProsite result showing the exact sites and what they are on the amino acid sequence. Note the many glycosylation sites.


  • I navigated to InterProScan and inputted the amino acid sequence and hit submit.
InterProScan5 Results
Figure 6: The InterProScan results showing the predicted domains of the protein. The results show the protein to be a member of gp160.

CD Server

  • I navigated to CD Server and inputted the amino acid sequence. Then I changed the Expect Value Threshlod to 1, and hit submit.
CD Server Results
Figure 7: The results from CD Server, showing that the inputted string is a part of the gp120 protein.

Predicting the Secondary Structure of a Protein


  • I navigated to PsiPred. I inputted the amino acid sequence and gave it the identifier "S15V4C3", then hit Predict. I waited about 15 minutes until it finished the prediction.
PsiPred result
Figure 8: The results from PsiPred using the amino acid from Subject 15, visit 4, clone 3. Note the two alpha helices and presence of many beta sheets.


  • I navigated to Predict Protein. I created an account so I could utilize the service. Then I validated my account and returned to the site. I logged in and inputted the amino acid sequence. I then resubmitted the job to get current results. The detailed results are visible here.
PredictProtein Old results
Figure 9: The results using the autogenerated PredictProtein results. The red bars are alpha helices.
PredictProtein New results
Figure 10:The results using the newly generated PredictProtein results. The red bars are alpha helices. Note the one large alpha helix and the one smaller one.

Crystal Structure Comparison

  • I navigated to NCBI and downloaded the structure as a CN3D file. I opened the file in CN3D (Fig. 11), and selected the amino acid sequence that corresponded to the similar sequence to what Translate returned on the given amino acid sequence (Fig. 1). I selected "Show Selected Residues" to display only what was selected (Fig. 12). The presence of a smaller alpha helix in both PsiPred (Fig. 8) and PredictProtein (Fig. 10) indicates that mutations in the protein may have caused an alpha helix to form. However, the one large alpha helix is likely the alpha helix present in the original crystal structure. The presence of many beta sheets goes alongside the presence of beta sheets as seen from PsiPref (Fig. 8).
gp120 crystal structure
Figure 11: The crystal structure of gp120.
selected amino acid sequence in gp120
Figure 12: The protein that is coded for by the amino acid sequence in gp120 that is closest to the amino acid sequence returned by subject 15, visit 4, clone 3.


Nicole Anguiano
BIOL 368, Fall 2014

Assignment Links
Individual Journals
Class Journals