BIOL368/F14:Nicole Anguiano Week 8: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(→‎Working with Protein Sequences In-class Activity: Added PsiPred and PredictProtein)
Line 1: Line 1:
==Defining Your HIV Structure Research Project==
==Defining Your HIV Structure Research Project==
#What is your question?
Project going to be worked on in conjugation with [[User:Isabel_Gonzaga | Isabel Gonzaga]] and [[User:Chloe_Jones | Chloe Jones]]. The text below is taken from [[BIOL368/F14:Isabel Gonzaga Week 8|Isabel Gonzaga Week 8]], but the project we are working on uses the same question, hypothesis, and subject data.
#*
===Question===
#Make a prediction (hypothesis) about the answer to your question before you begin your analysis.
How does HIV status (diagnosed, progressing or non-trending) affect the structure of the V3 protein region?
#*
===Hypothesis===
#Which subjects, visits, and clones will you use to answer your question?
We hypothesize that diagnosed groups will express greater variability in the V3 region in their protein structure, in comparison to the non-trending groups. Initial comparisons show that diagnosed groups and progressing groups expressed greater genetic variability than non-trending groups. These changes may affect the third variable region, affecting the host's ability to adapt to the changes and generate sufficient immune response.  
#*
===Subject Data===
<!--{|border="1"
According to the [http://bioquest.org/bedrock/problem_spaces/hiv/HIV_data_table_README.pdf BEDROCK HIV Sequence Data Table], I was able to determine which of the subjects used within my study actually developed aids. All 3 AIDS diagnosed were confirmed with the disease by their final visit. In the AIDS progressing groups, subjects developed AIDS within 1 year after their final visit. The Non-Trending groups all maintained high CD4 T Cell Counts above the threshold, even after the study was conducted. Sequences were for each visit and subject were chosen using a [http://www.random.org/integers/ Random Integer Generator], to eliminate selection bias.
 
<br>The following sequences was taken from the [http://bioquest.org/bedrock/problem_spaces/hiv/nucleotide_sequences.php BEDROCK HIV Problem Space Database], from the Markham et al. (1998) study. <br>
<b>Table 1: Sequences analyzed</b><Br>
{|border="1"
|-  
|-  
! Group !! Subject !! Visit !! Sequences
! Group !! Subject !! Visit !! Sequences
|-
|-
| AIDS Diagnosed || 3 <br><br><br>10<br><br><br>15|| 1<br>6 <br><Br>1<br>6<br><br>1<br>4|| 1, 2, 3<br>3, 5, 6<br><br>3 , 5, 7<br>4, 6, 8<br><br>6, 9 , 12<br> 2, 6, 8
| AIDS Diagnosed || 3 <br><br><br>10<br><br><br>15|| 1<br>6 <br><Br>1<br>6<br><br>1<br>4|| 6, 5, 8<br>3, 4, 7<br><br>6, 7, 8<br>2, 4, 8<br><br>2, 3, 4<br> 5, 8, 10
|-
|-
| AIDS Progressing || 8 <br><br><br>9<br><br><br>14|| 1<br>7<br><Br>1<br>8<br><br>1<br>9|| 1, 2, 4<br>3, 5, 7<br><br>2, 3, 4<br>2, 4, 8<br><br>2, 5 , 6<br> 3, 6, 9
| AIDS Progressing || 7 <br><br><br>8<br><br><br>14|| 1<br>5<br><Br>1<br>7<br><br>1<br>9|| 2, 3, 9<br>2, 8, 10<br><br>3, 6, 10<br>1, 6, 7<br><br>6, 7, 12<br>9, 10, 11
|-
|-
| No Trend || 5 <br><br><br>6<br><br><br>13|| 1<br>5 <br><Br>1<br>8<br><br>1<br>5|| 2, 4, 6<br>1, 3, 4<br><br>1, 2, 3<br>4, 6, 8<br><br>2, 3, 4<br> 1, 2, 4
| No Trend || 5 <br><br><br>6<br><br><br>13|| 1<br>5 <br><Br>1<br>9<br><br>1<br>5|| 1, 3, 12<br>4, 8, 9<br><br>3, 4, 10<br>6, 7, 9<br><br>1, 3, 8<br> 3, 5, 7
|}-->
|}
 
<!--You should choose a combination of subjects, visits, and clones that will add up to approximately 50 sequences. You will need about that many sequences to answer a reasonably complex question. However, you cannot use more because the multiple sequence alignment tool cannot handle more than that many sequences.
Justify why you chose the subjects, visits, and clones you did.-->


==Working with Protein Sequences In-class Activity==
==Working with Protein Sequences In-class Activity==

Revision as of 13:47, 21 October 2014

Defining Your HIV Structure Research Project

Project going to be worked on in conjugation with Isabel Gonzaga and Chloe Jones. The text below is taken from Isabel Gonzaga Week 8, but the project we are working on uses the same question, hypothesis, and subject data.

Question

How does HIV status (diagnosed, progressing or non-trending) affect the structure of the V3 protein region?

Hypothesis

We hypothesize that diagnosed groups will express greater variability in the V3 region in their protein structure, in comparison to the non-trending groups. Initial comparisons show that diagnosed groups and progressing groups expressed greater genetic variability than non-trending groups. These changes may affect the third variable region, affecting the host's ability to adapt to the changes and generate sufficient immune response.

Subject Data

According to the BEDROCK HIV Sequence Data Table, I was able to determine which of the subjects used within my study actually developed aids. All 3 AIDS diagnosed were confirmed with the disease by their final visit. In the AIDS progressing groups, subjects developed AIDS within 1 year after their final visit. The Non-Trending groups all maintained high CD4 T Cell Counts above the threshold, even after the study was conducted. Sequences were for each visit and subject were chosen using a Random Integer Generator, to eliminate selection bias.


The following sequences was taken from the BEDROCK HIV Problem Space Database, from the Markham et al. (1998) study.
Table 1: Sequences analyzed

Group Subject Visit Sequences
AIDS Diagnosed 3


10


15
1
6

1
6

1
4
6, 5, 8
3, 4, 7

6, 7, 8
2, 4, 8

2, 3, 4
5, 8, 10
AIDS Progressing 7


8


14
1
5

1
7

1
9
2, 3, 9
2, 8, 10

3, 6, 10
1, 6, 7

6, 7, 12
9, 10, 11
No Trend 5


6


13
1
5

1
9

1
5
1, 3, 12
4, 8, 9

3, 4, 10
6, 7, 9

1, 3, 8
3, 5, 7

Working with Protein Sequences In-class Activity

Reading a SWISS-PROT Entry

  • I navigated to UniProt. I searched for "Q75760". Here is a portion of the results from the protein that came up from the search.

Entry Information

Entry Name: Q75760_9HIV1
Primary (citable) accession number: Q75760
Integrated into UniProtKB/TrEMBL: November 1, 1996
Last sequence update: November 1, 1996
Last modified: October 1, 2014

Names & Taxonomy

Protein names: Envelope glycoprotein gp160
Gene names: env
Organism: Human immunodeficiency virus 1
Taxonomic identifier: 11676
Taxonomic lineage: Viruses › Retro-transcribing viruses › Retroviridae › Orthoretrovirinae › Lentivirus › Primate lentivirus group

Function

The envelope glyprotein gp160 precursor down-modulates cell surface CD4 antigen by interacting with it in the endoplasmic reticulum and blocking its transport to the cell surface.
The gp120-gp41 heterodimer allows rapid transcytosis of the virus through CD4 negative cells such as simple epithelial monolayers of the intestinal, rectal and endocervical epithelial barriers. Both gp120 and gp41 specifically recognize glycosphingolipids galactosyl-ceramide (GalCer) or 3' sulfo-galactosyl-ceramide (GalS) present in the lipid rafts structures of epithelial cells. Binding to these alternative receptors allows the rapid transcytosis of the virus through the epithelial cells. This transcytotic vesicle-mediated transport of virions from the apical side to the basolateral side of the epithelial cells does not involve infection of the cells themselves.

Interaction

Binary Interactions

With Entry #Exp IntAct Notes
P84801 2 EBI-8453491,EBI-8453570 From a different organism.
ath Q9KWN0 2 EBI-8453491,EBI-8453511 From a different organism.
UDA1 P11218 2 EBI-8453491,EBI-8453649 From a different organism.

Protein-protein interaction databases

Dip DIP-59960N.
IntAct Q75760. 3 interactions.
MINT MINT-8414778.

Subcellular Location

Virion membrane; Single-pass type I membrane protein. Host cell membrane; Single-pass type I membrane protein. Host endosome membrane; Single-pass type I membrane protein

PTM / Processing

Amino Acid modifications

Feature Key Position(s) Length Description
Glycosylation 298-298 1 N-linked (GlcNAc...)

Miscellaneous

Keywords - Technical term
3D-structure

Cross-References

  • This section contained a variety of references. There were sequence databases (EMBL, GenBank, DDBJ, PIR), 3D structure databases (PDBe, RCSB PDB, PDBj, ProteinModelPortal, SMR, ModBase, ModiDB), protein-protein interaction databases (DIP, IntAct, MINT), protocols and materials databases (Structural Biology Knowledgebase), miscellaneous databases (EvolutionaryTrace), and family and domain databases (Gene3D, InterPro, Pfam, SUBFAM, Protonet).

Features

See table under PTM / Processing.

Question

  1. If you search on the keywords "HIV" and "gp120", how many results do you get?
  • Searching "hiv" returns 600,415 results. Searching "gp120" returns 182,286 results. Searching "hiv AND gp120" returned 180,227 results.

ORFing your DNA sequence

'Subject 15, visit 4, subject 3, open reading frames
Figure 1: The six possible open reading frames for subject 15, visit 4, clone 3.
  • Comparing to the fasta sequence of the Uniprot protein above, I can see that the first open reading frame is most likely the first. The amino acid sequence, "EVVIRSENFTNNAKIIIVHLNESVVINCTRPNNNTRRKIPIGPGSSFYTTGIIGDIRQAHCNISGSKWNNTLKQIVNKLREQFVNKTIIFNQSS", is extremely similar to the sequence contained in the Uniprot protein, "EVVIRSDNFTNNAKTIIVQLKESVEINCTRPNNNTRKSIHIGPGRAFYTTGEIIGDIRQAHCNISRAKWNDTLKQIVIKLREQFENKTIVFNHSS". There are very few differences between them, indicating that likely the env gene is located in that location in the overall protein.

Working with a single protein sequence

ProtParam

  • I navigated to ProtParam, and inputted the sequence from the fasta file above, then selected "Compute Parameters". The result was as follows:

Number of amino acids: 847
Molecular weight: 96160.4
Theoretical pI: 8.55

Amino acid composition:

Ala (A) 46 5.4%
Arg (R) 52 6.1%
Asn (N) 59 7.0%
Asp (D) 30 3.5%
Cys (C) 22 2.6%
Gln (Q) 39 4.6%
Glu (E) 55 6.5%
Gly (G) 58 6.8%
His (H) 11 1.3%
Ile (I) 65 7.7%
Leu (L) 84 9.9%
Lys (K) 42 5.0%
Met (M) 16 1.9%
Phe (F) 24 2.8%
Pro (P) 29 3.4%
Ser (S) 47 5.5%
Thr (T) 60 7.1%
Trp (W) 27 3.2%
Tyr (Y) 23 2.7%
Val (V) 58 6.8%
Pyl (O) 0 0.0%
Sec (U) 0 0.0%
(B) 0 0.0%
(Z) 0 0.0%
(X) 0 0.0%


Total number of negatively charged residues (Asp + Glu): 85
Total number of positively charged residues (Arg + Lys): 94

Atomic composition:

Carbon C 4286
Hydrogen H 6778
Nitrogen N 1192
Oxygen O 1246
Sulfur S 38


Formula: C4286H6778N1192O1246S38
Total number of atoms: 13540

Extinction coefficients:
Extinction coefficients are in units of M-1 cm-1, at 280 nm measured in water.
Ext. coefficient 184145
Abs 0.1% (=1 g/l) 1.915, assuming all pairs of Cys residues form cystines
Ext. coefficient 182770
Abs 0.1% (=1 g/l) 1.901, assuming all Cys residues are reduced

Estimated half-life:
The N-terminal of the sequence considered is M (Met).
The estimated half-life is: 30 hours (mammalian reticulocytes, in vitro), >20 hours (yeast, in vivo), >10 hours (Escherichia coli, in vivo).

Instability index:
The instability index (II) is computed to be 37.91
This classifies the protein as stable.
Aliphatic index: 93.90
Grand average of hydropathicity (GRAVY): -0.220

ProtScale

  • I navigated to ProtScale and entered the amino acid sequence. I changed the "Window Size" dropdown to 19, then hit Submit. I saved the image as a .gif (Fig. 2).
Protscale result for Q75760
Figure 2: The ProtScale result for Q75760.

TMHMM

  • Next, I navigated to TMHMM, pasted in the sequence, then hit submit, then saved the image (Fig. 3).
TMHMM result for Q75760
Figure 3: The TMHMM result for Q75760.

ScanProsite

  • I navigated to ScanProsite and inputted the amino acid sequence. I deselected "Exclude motifs with a high probability of occurrence from the scan", and then hit "START THE SCAN".
ScanProsite Result Part 1
Figure 4: The ScanProsite result showing the location of the sites on the amino acid sequence.
ScanProsite Result Part 2
Figure 5: The ScanProsite result showing the exact sites and what they are on the amino acid sequence. Note the many glycosylation sites.

InterProScan

  • I navigated to InterProScan and inputted the amino acid sequence and hit submit.
InterProScan5 Results
Figure 6: The InterProScan results showing the predicted domains of the protein.

CD Server

  • I navigated to CD Server and inputted the amino acid sequence. Then I changed the Expect Value Threshlod to 1, and hit submit.
CD Server Results
Figure 7: The results from CD Server, showing that the inputted string is a part of the gp120 protein.


Predicting the Secondary Structure of a Protein

PsiPred

  • I navigated to PsiPred. I inputted the amino acid sequence and gave it the identifier "S15V4C3", then hit Predict. I waited about 15 minutes until it finished the prediction.
PriPred result
Figure 8: The results from PsiPred using the amino acid from Subject 15, visit 4, clone 3.

PredictProtein

  • I navigated to Predict Protein. I created an account so I could utilize the service. Then I validated my account and returned to the site. I logged in and inputted the amino acid sequence. I then resubmitted the job to get current results.

Links

Nicole Anguiano
BIOL 368, Fall 2014

Assignment Links
Individual Journals
Class Journals