BIOL368/F14:Nicole Anguiano Week 8

Defining Your HIV Structure Research Project

Project going to be worked on in conjugation with Isabel Gonzaga and Chloe Jones. The text below is taken from Isabel Gonzaga Week 8, but the project we are working on uses the same question, hypothesis, and subject data.

Question

How does HIV status (diagnosed, progressing or non-trending) affect the structure of the V3 protein region?

Hypothesis

We hypothesize that diagnosed groups will express greater variability in the V3 region in their protein structure, in comparison to the non-trending groups. Initial comparisons show that diagnosed groups and progressing groups expressed greater genetic variability than non-trending groups. These changes may affect the third variable region, affecting the host's ability to adapt to the changes and generate sufficient immune response.

Subject Data

According to the BEDROCK HIV Sequence Data Table, I was able to determine which of the subjects used within my study actually developed aids. All 3 AIDS diagnosed were confirmed with the disease by their final visit. In the AIDS progressing groups, subjects developed AIDS within 1 year after their final visit. The Non-Trending groups all maintained high CD4 T Cell Counts above the threshold, even after the study was conducted. Sequences were for each visit and subject were chosen using a Random Integer Generator, to eliminate selection bias.

The following sequences was taken from the BEDROCK HIV Problem Space Database, from the Markham et al. (1998) study.
Table 1: Sequences analyzed

Group	Subject	Visit	Sequences
AIDS Diagnosed	3 10 15	1 6 1 6 1 4	1, 2, 4 3, 4, 5 3, 6, 7 2, 4, 8 2, 3, 4 5, 8, 10
AIDS Progressing	7 8 14	1 5 1 7 1 9	2, 3, 9 2, 8, 9 1, 4, 5 1, 6, 7 2, 3, 4 9, 10, 11
No Trend	5 6 13	1 5 1 9 1 5	1, 3, 8 4, 5, 2 1, 2, 3 6, 7, 9 1, 3, 4 3, 5, 4

Working with Protein Sequences In-class Activity

Reading a SWISS-PROT Entry

I navigated to UniProt. I searched for "Q75760". Here is a portion of the results from the protein that came up from the search.

Entry Information

Entry Name: Q75760_9HIV1
Primary (citable) accession number: Q75760
Integrated into UniProtKB/TrEMBL: November 1, 1996
Last sequence update: November 1, 1996
Last modified: October 1, 2014

Names & Taxonomy

Protein names: Envelope glycoprotein gp160
Gene names: env
Organism: Human immunodeficiency virus 1
Taxonomic identifier: 11676
Taxonomic lineage: Viruses › Retro-transcribing viruses › Retroviridae › Orthoretrovirinae › Lentivirus › Primate lentivirus group

Function

The envelope glyprotein gp160 precursor down-modulates cell surface CD4 antigen by interacting with it in the endoplasmic reticulum and blocking its transport to the cell surface.
The gp120-gp41 heterodimer allows rapid transcytosis of the virus through CD4 negative cells such as simple epithelial monolayers of the intestinal, rectal and endocervical epithelial barriers. Both gp120 and gp41 specifically recognize glycosphingolipids galactosyl-ceramide (GalCer) or 3' sulfo-galactosyl-ceramide (GalS) present in the lipid rafts structures of epithelial cells. Binding to these alternative receptors allows the rapid transcytosis of the virus through the epithelial cells. This transcytotic vesicle-mediated transport of virions from the apical side to the basolateral side of the epithelial cells does not involve infection of the cells themselves.

Interaction

Binary Interactions

With	Entry	#Exp	IntAct	Notes
	P84801	2	EBI-8453491,EBI-8453570	From a different organism.
ath	Q9KWN0	2	EBI-8453491,EBI-8453511	From a different organism.
UDA1	P11218	2	EBI-8453491,EBI-8453649	From a different organism.

Protein-protein interaction databases

Dip	DIP-59960N.
IntAct	Q75760. 3 interactions.
MINT	MINT-8414778.

Subcellular Location

Virion membrane; Single-pass type I membrane protein. Host cell membrane; Single-pass type I membrane protein. Host endosome membrane; Single-pass type I membrane protein

PTM / Processing

Amino Acid modifications

Feature Key	Position(s)	Length	Description
Glycosylation	298-298	1	N-linked (GlcNAc...)

Miscellaneous

Keywords - Technical term
3D-structure

Cross-References

This section contained a variety of references. There were sequence databases (EMBL, GenBank, DDBJ, PIR), 3D structure databases (PDBe, RCSB PDB, PDBj, ProteinModelPortal, SMR, ModBase, ModiDB), protein-protein interaction databases (DIP, IntAct, MINT), protocols and materials databases (Structural Biology Knowledgebase), miscellaneous databases (EvolutionaryTrace), and family and domain databases (Gene3D, InterPro, Pfam, SUBFAM, Protonet).

Features

See table under PTM / Processing.

Question

If you search on the keywords "HIV" and "gp120", how many results do you get?

Searching "hiv" returns 600,415 results. Searching "gp120" returns 182,286 results. Searching "hiv AND gp120" returned 180,227 results.

ORFing your DNA sequence

I chose to ORF the DNA sequence from subject 15, visit 4, clone 3. Using Translate, I inputted the DNA sequence and obtained the 6 possible open reading frames.

Comparing to the fasta sequence of the Uniprot protein above, I can see that the first open reading frame is most likely the first. The amino acid sequence, "EVVIRSENFTNNAKIIIVHLNESVVINCTRPNNNTRRKIPIGPGSSFYTTGIIGDIRQAHCNISGSKWNNTLKQIVNKLREQFVNKTIIFNQSS", is extremely similar to the sequence contained in the Uniprot protein, "EVVIRSDNFTNNAKTIIVQLKESVEINCTRPNNNTRKSIHIGPGRAFYTTGEIIGDIRQAHCNISRAKWNDTLKQIVIKLREQFENKTIVFNHSS". There are very few differences between them, indicating that likely the env gene is located in that location in the overall protein.

Working with a single protein sequence

ProtParam

I navigated to ProtParam, and inputted the sequence from the clone above, then selected "Compute Parameters". The result was as follows:

Number of amino acids: 94
Molecular weight: 10625.1
Theoretical pI: 10.14

Amino acid composition:

Ala (A)	2	2.1%
Arg (R)	6	6.4%
Asn (N)	14	14.9%
Asp (D)	1	1.1%
Cys (C)	2	2.1%
Gln (Q)	4	4.3%
Glu (E)	4	4.3%
Gly (G)	5	5.3%
His (H)	2	2.1%
Ile (I)	14	14.9%
Leu (L)	3	3.2%
Lys (K)	6	6.4%
Met (M)	0	0.0%
Phe (F)	4	4.3%
Pro (P)	3	3.2%
Ser (S)	8	8.5%
Thr (T)	7	7.4%
Trp (W)	1	1.1%
Tyr (Y)	1	1.1%
Val (V)	7	7.4%
Pyl (O)	0	0.0%
Sec (U)	0	0.0%
(B)	0	0.0%
(Z)	0	0.0%
(X)	0	0.0%

Total number of negatively charged residues (Asp + Glu): 5
Total number of positively charged residues (Arg + Lys): 12

Atomic composition:

Carbon	C	466
Hydrogen	H	759
Nitrogen	N	141
Oxygen	O	139
Sulfur	S	2

Formula: C₄₆₆H₇₅₉N₁₄₁O₁₃₉S₂
Total number of atoms: 13540

Extinction coefficients:
Extinction coefficients are in units of M-1 cm-1, at 280 nm measured in water.
Ext. coefficient 7115
Abs 0.1% (=1 g/l) 0.670, assuming all pairs of Cys residues form cystines
Ext. coefficient 6990
Abs 0.1% (=1 g/l) 0.658, assuming all Cys residues are reduced

Estimated half-life:
The N-terminal of the sequence considered is M (Met).
The estimated half-life is: 1 hours (mammalian reticulocytes, in vitro), 30 min (yeast, in vivo), >10 hours (Escherichia coli, in vivo).

Instability index:
The instability index (II) is computed to be 45.96
This classifies the protein as unstable.
Aliphatic index: 94.26
Grand average of hydropathicity (GRAVY): -0.362

ProtScale

I navigated to ProtScale and entered the amino acid sequence. I changed the "Window Size" dropdown to 19, then hit Submit. I saved the image as a .gif (Fig. 2).

TMHMM

Next, I navigated to TMHMM, pasted in the sequence, then hit submit, then saved the image (Fig. 3).

ScanProsite

I navigated to ScanProsite and inputted the amino acid sequence. I deselected "Exclude motifs with a high probability of occurrence from the scan", and then hit "START THE SCAN".

ScanProsite Result Part 2 — **Figure 5**: The ScanProsite result showing the exact sites and what they are on the amino acid sequence. Note the many glycosylation sites.

InterProScan

I navigated to InterProScan and inputted the amino acid sequence and hit submit.

CD Server

I navigated to CD Server and inputted the amino acid sequence. Then I changed the Expect Value Threshlod to 1, and hit submit.

Predicting the Secondary Structure of a Protein

PsiPred

I navigated to PsiPred. I inputted the amino acid sequence and gave it the identifier "S15V4C3", then hit Predict. I waited about 15 minutes until it finished the prediction.

PredictProtein

I navigated to Predict Protein. I created an account so I could utilize the service. Then I validated my account and returned to the site. I logged in and inputted the amino acid sequence. I then resubmitted the job to get current results. The detailed results are visible here.

PredictProtein New results — **Figure 10**:The results using the newly generated PredictProtein results. The red bars are alpha helices. Note the one large alpha helix and the one smaller one.

Crystal Structure Comparison

I navigated to NCBI and downloaded the structure as a CN3D file. I opened the file in CN3D (Fig. 11), and selected the amino acid sequence that corresponded to the similar sequence to what Translate returned on the given amino acid sequence (Fig. 1). I selected "Show Selected Residues" to display only what was selected (Fig. 12). The presence of a smaller alpha helix in both PsiPred (Fig. 8) and PredictProtein (Fig. 10) indicates that mutations in the protein may have caused an alpha helix to form. However, the one large alpha helix is likely the alpha helix present in the original crystal structure. The presence of many beta sheets goes alongside the presence of beta sheets as seen from PsiPref (Fig. 8).

gp120 crystal structure — **Figure 11**: The crystal structure of gp120.

selected amino acid sequence in gp120 — **Figure 12**: The protein that is coded for by the amino acid sequence in gp120 that is closest to the amino acid sequence returned by subject 15, visit 4, clone 3.

Links

Nicole Anguiano
BIOL 368, Fall 2014

Assignment Links

Individual Journals

Class Journals

BIOL368/F14:Nicole Anguiano Week 8

Defining Your HIV Structure Research Project

Question

Hypothesis

Subject Data

Working with Protein Sequences In-class Activity

Reading a SWISS-PROT Entry

Entry Information

Names & Taxonomy

Function

Interaction

Subcellular Location

PTM / Processing

Miscellaneous

Cross-References

Features

Question

ORFing your DNA sequence

Working with a single protein sequence

ProtParam

ProtScale

TMHMM

ScanProsite

InterProScan

CD Server

Predicting the Secondary Structure of a Protein

PsiPred

PredictProtein

Crystal Structure Comparison

Links

Assignment Links

Individual Journals

Class Journals

Navigation menu

Search