BIOL368/F14:Chloe Jones Week 8: Difference between revisions

Revision as of 07:07, 22 October 2014

Working with Protein Sequences In-class Activity

Chapter 4: Reading a SWISS-PROT entry

Source: Bioinformatics for Dummies pp.110-123

Using the database UniProt KB which has two subsections: Swis-Prot and TrEMBL.

If you search on the keywords "HIV" and "gp120", how many results do you get?
- 180,227 results.

General Information about the entry

Entry name: 9HIV1
Primary Accession Number :Q75760
Secondary Accession Numbers :N/A
Intergrated into Swiss-Prot on :November 1, 1996
Sequence was last modified on : November 1, 1996
Annotations were last modified on : October 1, 2014

Name and origin of the protein

Protein name:Envelope glycoprotein gp160
Synonyms:N/A
Gene name:Env
From:Homo sapiens (Human) [TaxID: 9606] TaxID:9606
Taxonomy: Viruses › Retro-transcribing viruses › Retroviridae › Orthoretrovirinae › Lentivirus › Primate lentivirus group

References

Number of references:9

The comments :?? The cross references :172

ENA, European Nucleotide Archive (more information possibly)

Keyword

Apoptosis, fushion of virus membrane with host membrane, host-virus interaction, viral attachment to host cell, viral immunoevasion, Viral penetration into host cytoplasm, virus entry into host cell

Chapter 5: ORFing your DNA sequence

Source: Bioinformatics for Dummies pp.146-147

Fasta format of Envelope glycoprotein gp160 obtained from UniProt KB was placed in the in the imput box of ORF Finder(Open Reading Frame Finder). 6 parallel horizontal bars were present with integers correlating to the reading frame.
Putting in DNA sequences: Subject 1 visit 1, clone 1:
The protein with the open reading frame was in the ORF Finder database was projected to be #2, or #3. Looking at the exPASy database 5’3” Frame 3 had an open reading frame with no stop codons interrupting /truncating the lengh of the DNA.
Apoptosis, fushion of virus membrane with host membrane, host-virus interaction, viral attachment to host cell, viral immunoevasion, Viral penetration into host cytoplasm, virus entry into host cell

The features Media***

Chapter 6:Working with a single protein sequence

Source: Bioinformatics for Dummies pp.159-195

Predicting the main physico-chemical properties of a protein

Used the program ExPasy ProtParam for computation of physcial and chemical parameters of a given protein
Imput the Swiss-Prot/TrEMBL accession number Q75760. Then click compute parameters button >Submit
- Corresponds to the HIV gp120 sequence that was used for the crystal structure for the Huang et al. (2005) paper.
- Can also enter raw sequence
Save the file
media ***

Interpreting ProtParam results

Used ExPasy ProtParam for data.
Molecular Weight: 96160.4 Daltons
Extinction Coefficients:
- assuming all pairs of Cys residues form cystines= 184145M1cm1
- assuming all Cys residues are reduced= 182770 M1cm1
- ~Tell you how much light (visible or invisible) your protein absorbs at a certain wavelength.
Instability
- instability index (II) is computed to be 37.91
- classifies the protein as stable
Half-Life
- The estimated half-life is:
  - 30 hours (mammalian reticulocytes, in vitro)
  - >20 hours (yeast, in vivo)
  - >10 hours (Escherichia coli, in vivo)

Digesting a protein in a computer

Used ExPasy Peptide Cutter
Imput accession number: Q75760
Program cuts protein at specific site

Doing primary structure analysis

Looking for transmembrane segments

Used http://web.expasy.org/cgi-bin/protscale/protscale.pl ExPasy Protscale]
Imput accession number: Q75760
The amino acid scale Hphob. / Kyte & Doolittle was preselected
Changed the Window size to 19 because best window value for viewing trans membrane regions
Covert the image to GIF format
Save file
- Interpreting ProtScale results
- To confirm use Hphob. / Eisenberg et al.scale, set threshold to 1.6
  1. Piece of paper to over results
  2. lower paper to strongest peaks visible
  3. Keep lowering threshold as long as you keep seeing sharp peeks
- 4 Sharp Peaks, 4 Transmembrane domains
- Running TMHMM
- UseTMHMM Q75760 sequence was inputted in FASTA format>Submit
- 5 transmembrane domains identified Figure. #
  Figure 4. Identifying transmembrane domains using the TMHMM database. Transmembrane proteins are denoted in red.
- Looking for coiled regions
- Used COILS server at EMBnet, input accession number Q75760
- Changed input sequence format to SwissProtID or AC
  Figure 4. Coils output for Q75760, using the coils server at EMBnet. Look at regions between 600-700.

Predicting post-translational modifications in your protein

Looking for PROSITE patterns
- Used ScanProsite, input accession number Q75760
- Uncheck excluded motifs with high probability of occurrence, check the excluded profiles from scan box >start the scan
Interpreting ScanProsite results

Figure4. Type of patterns found in protein using ScanProsite. Each color representing own pattern family.

- Slide mouse over color rectangles to see name displayed, click to receive more information
- the list of segments contain patterns within protein; numbers indicate position and capital letters are specified and lowercase letters are unspecified by patterns
- ~be careful with short patterns
- Weak signals add up to give strong signals –two related patterns at close distance, significant
- Eliminating weak patterns-multiple sequence alignment

Finding Known Domains in Your Protein

Finding Domains with InterProScan
- [ http://www.ebi.ac.uk/Tools/pfa/iprscan5/ InterProScan], imput fasta format of Q75760 sequence>submit

[[Image:interproscan.png|thumb|right|300px|= number of domains.|Figure 4. Domain determined for glycoprotein gp160 using InterProScan when comparing sequences with domain databases ]

Finding domains with the CD server
- Used NCBI CD Server, input accession number Q75760,
- Deselect the apply low complexity filter check box, change threshold set to 1 >Submit Query'
Interpreting and Understanding CD results
- Red domain are from SMART
- Ragged indicates partial matches
- lower e-value correlates to a better score

Finding domains with Motif Scan
- Unfortunately, the the Motif-Scan provider was having complications with generating the information upon putting in the FASTA format.

Chapter 11: Predicting the secondary structure of a protein sequence and additional structural features

Source: Bioinformatics for Dummies pp.330-336

From the Primary to Secondary Structures

Predicting the secondary structure of a protein sequence

**Used PsiPred, input protein sequence S1V1-1 via FASTA format, and gave the sequence a short identifier (s1v1-1).

Predicting additional structural features

**Used Predict Protein. In order to use program had to make an account. Input amino acid for S1V1-1.

[[Image:predict protein .png|thumb|right|300px|= PDB.|Figure 4. The results using Predict Protein database for S1V1-1. Red denotes a helix, Blue are portions that are exposed, and yellow means buried ] Comparison of crystal structure of gp120 From the NCBI website the Hunag et al. (2005) paper was downloaded via CN3D file.To make it more visible I used a secondary shortcut to make the helices (green) and beta sheets (yellow) so they could be better differentiated and analyzed. The structure of the protein form NCBI website favors a composition favors a composition where there are more beta sheets present with random coil in comparison to the prediction from PsiPred that had them being in equal abundance. [[Image:comparison.png|thumb|right|300px|= comparison .|Figure 4. The crystal structure determined by Huang et al. (2005) using CN3D program. (used secondary shortcut)]

@@ Line 98: / Line 98: @@
 <u>Finding Known Domains in Your Protein</u>
 *<b>Finding Domains with InterProScan</b>
-**[http://www.ebi.ac.uk/Tools/pfa/iprscan5/ InterProScan], imput fasta format of Q75760 sequence>submit
+**[ http://www.ebi.ac.uk/Tools/pfa/iprscan5/ InterProScan], imput fasta format of Q75760 sequence>submit
-***
+ [[Image:interproscan.png|thumb|right|300px|= number of domains.|<b>Figure 4.</b> Domain determined for glycoprotein gp160 using InterProScan when comparing sequences with domain databases ]
+*<b>Finding domains with the CD server </b>
+**Used [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi NCBI CD Server], input accession number Q75760,
+**Deselect the apply low complexity filter check box, change threshold set to 1 >Submit Query'
+*<b>Interpreting and Understanding CD results </b>
+**Red domain are from SMART
+**Ragged indicates partial matches
+**lower e-value correlates to a better score
+[[Image:cd results .png|thumb|right|300px|= conserved domains.|<b>Figure 4.</b> Conservered  domains for glycoprotein gp120 found using the CD Server]
+*<b> Finding domains with Motif Scan </b>
+**Unfortunately, the [http://myhits.isb-sib.ch/cgi-bin/motif_scan the Motif-Scan provider] was having complications with generating the information upon putting in the FASTA format.
+===Chapter 11: Predicting the secondary structure of a protein sequence and additional structural features ===
+Source: Bioinformatics for Dummies pp.330-336
+<u>From the Primary to Secondary Structures </u>
+*<b>Predicting the secondary structure of a protein sequence</b>
+ **Used [http://bioinf.cs.ucl.ac.uk/psipred/ PsiPred], input protein sequence S1V1-1 via FASTA format, and gave the sequence a short identifier (s1v1-1).
+After the job was submitted it ____minutes to generate the information
+[[Image:Psipred .png|thumb|right|300px|= Psipred.|<b>Figure 4.</b> The results from PsiPred using the amino acid from S1V1-1]
+*<b>Predicting additional structural features</b>
+ **Used [https://www.predictprotein.org/ Predict Protein]. In order to use program had to make an account. Input amino acid for S1V1-1.
+[[Image:predict protein .png|thumb|right|300px|= PDB.|<b>Figure 4.</b> The results using Predict Protein database for S1V1-1. Red denotes a helix, Blue are portions that are exposed, and yellow means buried ]
+<u> Comparison of crystal structure of gp120 </u>
+From the NCBI website the Hunag et al. (2005) paper was downloaded via [http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?uid=2B4C CN3D file].To make it more visible I used a secondary shortcut to make the helices (green) and beta sheets (yellow) so they could be better differentiated and analyzed. The structure of the protein form NCBI website favors a composition favors a composition where there are more beta sheets present with random coil in comparison to the prediction from PsiPred that had them being in equal abundance.
+[[Image:comparison.png|thumb|right|300px|= comparison .|<b>Figure 4.</b> The crystal structure determined by Huang et al. (2005) using CN3D program. (used secondary shortcut)]

BIOL368/F14:Chloe Jones Week 8: Difference between revisions

Revision as of 07:07, 22 October 2014

Contents

Working with Protein Sequences In-class Activity

Chapter 4: Reading a SWISS-PROT entry

Chapter 5: ORFing your DNA sequence

Chapter 6:Working with a single protein sequence

Chapter 11: Predicting the secondary structure of a protein sequence and additional structural features

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools