BIOL368/F14:Chloe Jones Week 8
Working with Protein Sequences In-class Activity
Chapter 4: Reading a SWISS-PROT entry
Source: Bioinformatics for Dummies pp.110-123
- Using the database UniProt KB which has two subsections: Swis-Prot and TrEMBL.
- If you search on the keywords "HIV" and "gp120", how many results do you get?
- 180,227 results.
General Information about the entry
- Entry name: 9HIV1
- Primary Accession Number :Q75760
- Secondary Accession Numbers :N/A
- Intergrated into Swiss-Prot on :November 1, 1996
- Sequence was last modified on : November 1, 1996
- Annotations were last modified on : October 1, 2014
Name and origin of the protein
- Protein name:Envelope glycoprotein gp160
- Synonyms:N/A
- Gene name:Env
- From:Homo sapiens (Human) [TaxID: 9606] TaxID:9606
- Taxonomy: Viruses › Retro-transcribing viruses › Retroviridae › Orthoretrovirinae › Lentivirus › Primate lentivirus group
References
- Number of references:9
The comments :?? The cross references :172
- ENA, European Nucleotide Archive (more information possibly)
Keyword
- Apoptosis, fushion of virus membrane with host membrane, host-virus interaction, viral attachment to host cell, viral immunoevasion, Viral penetration into host cytoplasm, virus entry into host cell
Chapter 5: ORFing your DNA sequence
Source: Bioinformatics for Dummies pp.146-147
- Fasta format of Envelope glycoprotein gp160 obtained from UniProt KB was placed in the in the imput box of ORF Finder(Open Reading Frame Finder). 6 parallel horizontal bars were present with integers correlating to the reading frame.
- Putting in DNA sequences: Subject 1 visit 1, clone 1:
- The protein with the open reading frame was in the ORF Finder database was projected to be #2, or #3. Looking at the exPASy database 5’3” Frame 3 had an open reading frame with no stop codons interrupting /truncating the lengh of the DNA.
- Apoptosis, fushion of virus membrane with host membrane, host-virus interaction, viral attachment to host cell, viral immunoevasion, Viral penetration into host cytoplasm, virus entry into host cell
The features Media***
Chapter 6:Working with a single protein sequence
Source: Bioinformatics for Dummies pp.159-195
Predicting the main physico-chemical properties of a protein
- Used the program ExPasy ProtParam for computation of physcial and chemical parameters of a given protein
- Imput the Swiss-Prot/TrEMBL accession number Q75760. Then click compute parameters button >Submit
- Corresponds to the HIV gp120 sequence that was used for the crystal structure for the Huang et al. (2005) paper.
- Can also enter raw sequence
- Save the file
- media ***
Interpreting ProtParam results
- Used ExPasy ProtParam for data.
- Molecular Weight: 96160.4 Daltons
- Extinction Coefficients:
- assuming all pairs of Cys residues form cystines= 184145M1cm1
- assuming all Cys residues are reduced= 182770 M1cm1
- ~Tell you how much light (visible or invisible) your protein absorbs at a certain wavelength.
- Instability
- instability index (II) is computed to be 37.91
- classifies the protein as stable
- Half-Life
- The estimated half-life is:
- 30 hours (mammalian reticulocytes, in vitro)
- >20 hours (yeast, in vivo)
- >10 hours (Escherichia coli, in vivo)
- The estimated half-life is:
Digesting a protein in a computer
- Used ExPasy Peptide Cutter
- Imput accession number: Q75760
- Program cuts protein at specific site
Doing primary structure analysis
Looking for transmembrane segments
- Used http://web.expasy.org/cgi-bin/protscale/protscale.pl ExPasy Protscale]
- Imput accession number: Q75760
- The amino acid scale Hphob. / Kyte & Doolittle was preselected
- Changed the Window size to 19 because best window value for viewing trans membrane regions
- Covert the image to GIF format
- Save file
- Interpreting ProtScale results
- To confirm use Hphob. / Eisenberg et al.scale, set threshold to 1.6
- Piece of paper to over results
- lower paper to strongest peaks visible
- Keep lowering threshold as long as you keep seeing sharp peeks
- 4 Sharp Peaks, 4 Transmembrane domains
- Running TMHMM
- UseTMHMM Q75760 sequence was inputted in FASTA format>Submit
- 5 transmembrane domains identified Figure. #
- Looking for coiled regions
- Used COILS server at EMBnet, input accession number Q75760
- Changed input sequence format to SwissProtID or AC
Predicting post-translational modifications in your protein
- Looking for PROSITE patterns
- Used ScanProsite, input accession number Q75760
- Uncheck excluded motifs with high probability of occurrence, check the excluded profiles from scan box >start the scan
- Interpreting ScanProsite results
- Slide mouse over color rectangles to see name displayed, click to receive more information
- the list of segments contain patterns within protein; numbers indicate position and capital letters are specified and lowercase letters are unspecified by patterns
- ~be careful with short patterns
- Weak signals add up to give strong signals –two related patterns at close distance, significant
- Eliminating weak patterns-multiple sequence alignment
Finding Known Domains in Your Protein
- Finding Domains with InterProScan
- [ http://www.ebi.ac.uk/Tools/pfa/iprscan5/ InterProScan], imput fasta format of Q75760 sequence>submit
- Finding domains with the CD server
- Used NCBI CD Server, input accession number Q75760,
- Deselect the apply low complexity filter check box, change threshold set to 1 >Submit Query'
- Interpreting and Understanding CD results
- Red domain are from SMART
- Ragged indicates partial matches
- lower e-value correlates to a better score
- Finding domains with Motif Scan
- Unfortunately, the the Motif-Scan provider was having complications with generating the information upon putting in the FASTA format.
Chapter 11: Predicting the secondary structure of a protein sequence and additional structural features
Source: Bioinformatics for Dummies pp.330-336
From the Primary to Secondary Structures
- Predicting the secondary structure of a protein sequence
**Used PsiPred, input protein sequence S1V1-1 via FASTA format, and gave the sequence a short identifier (s1v1-1).
After the job was submitted it ____minutes to generate the information [[Image:Psipred .png|thumb|right|300px|= Psipred.|Figure 4. The results from PsiPred using the amino acid from S1V1-1]
- Predicting additional structural features
**Used Predict Protein. In order to use program had to make an account. Input amino acid for S1V1-1.
[[Image:predict protein .png|thumb|right|300px|= PDB.|Figure 4. The results using Predict Protein database for S1V1-1. Red denotes a helix, Blue are portions that are exposed, and yellow means buried ] Comparison of crystal structure of gp120 From the NCBI website the Hunag et al. (2005) paper was downloaded via CN3D file.To make it more visible I used a secondary shortcut to make the helices (green) and beta sheets (yellow) so they could be better differentiated and analyzed. The structure of the protein form NCBI website favors a composition favors a composition where there are more beta sheets present with random coil in comparison to the prediction from PsiPred that had them being in equal abundance. [[Image:comparison.png|thumb|right|300px|= comparison .|Figure 4. The crystal structure determined by Huang et al. (2005) using CN3D program. (used secondary shortcut)]