Isaiah M. Castaneda Week 8
Today, I am working with the gp120 envelope protein of HIV-1. First, I entered the website given in our Bioinformatics for Dummies book. It took me to a different, but related page. However, a search bar was not present so I navigated back to the homepage. There, I entered "gp 120 envelope protein" and clicked the "UniProtKB" option from the drop-down menu. There were 4,970 results. On the 4th page I chose entry Q2MDW6. I then opened and copied the FASTA format of the sequence. Chapter 4 of the book points out the many pieces of the entry page. Here are some screen-shots that show my protein entry page.
Chapter 5 of the book provides instruction on how to ORF a DNA sequence. Following these instructions, I went to http://www.ncbi.nlm.nih.gov/gorf/gorf.html. I pasted my FASTA formatted sequence into the appropriate box and then clicked, "OrfFind." Here is a picture of my output.
I believe this means that there are 5 possible ORF's at the threshold of 100 nucleotides. Clicking on the rectangle of the 1st bar, the program finds that 31 AA's in, there is an ATG. This is shown below
Perhaps this is a start codon!
I also used the ExPASy program and it immediately highlighted that there was a methionine. However, there was no stop codon. But, as the information from UniProt indicated, the sequence is only a piece of a larger one. Here is a picture of my ExPASy results.
Now, it is time to go through chapter 6. I went to www.expasy.org/tools/#primary. I got a notice warning me that the page no longer gets updated. Scrolling down the page, I found the ProtParam program and clicked on it. I proceeded by entering my sequence in the appropriate location and clicking "compute parameters." It rejected my sequence because it was less than 5 residues long. I thought something was wrong with my sequence, so I tried many other gp120 sequences with the same result. I then tried a gp160 sequence since they were much larger. The same error occurred. Finally, I just used the accession number, Q2MDW6, from my original sequence and it worked like a charm. Thinking back, I probably should not have pasted the label with the sequence when I tried the 1st method. I was very happy because with all the work this sequence and I have been through, I now have a very special attachment to it. The molecular weight is 21800.4. The extinction coefficient is 29825 M-1 cm-1. The instability index is 33.64. Since it is below 40, the protein is classified as stable. Below is a screen shot of the computation. The half-life ranges from 2 minutes to 1 hour depending on the organism it is in.
After this, I found the peptide cutter tool, entered in my sequence, and performed the cut. A few days later, I realized that the only examples that were assigned were: ProtParam & Looking for transmembrane segments. I was prepared to work through every example in chapter 6, but was relieved that it was not required. If screenshots of the peptide cutting want to be seen, they can be viewed by clicking the following links:
- Peptide Cutter Screenshot 1
- Peptide Cutter Screenshot 2
- Peptide Cutter Screenshot 3
- Peptide Cutter Screenshot 4
- Peptide Cutter Screenshot 5
- Peptide Cutter Screenshot 6
- Peptide Cutter Screenshot 7
It is now time to look for transmembrane segments. I pointed my browser to www.expasy.org/cgi-bin/protscale.pl as requested by the text and was immediately directed to the appropriate tool. I entered my sequence accession number & saw that the desired radio button was already selected. I then changed my window size to 19 and clicked "Submit." I hit "Submit" once more so that the entire sequence may be analyzed. Below is the resulting output.
The "magic value" of 1.6 is the recommended threshold (for the Kyte & Doolittle Hydrophobicity Scale) for determining if your peaks are indicative of transmembrane regions in the protein being examined. It is clear that there are no such instances in the graph above, therefore there must not be any of these domains in my gp120 protein.
Using the Eisenberg scale, the text book proves to be true in stating that there is not much difference in the main features.
My next task is to run TMHMM. I visited the webpage, www.cbs.dtu.dk/services/TMHMM-2.0 and entered the same sequence that I had been using throughout this whole assignment. I kept the default values as demanded by the text and clicked the "Submit" button. My heart was in mid-drop when a page popped up saying that I had to wait while my results were being processed, suggesting that I enter in my email address so that I may be notified when the process was complete. I thought to myself, "Oh no, this is going to take forever!" Much to my delight, though, it took about but a second for the page to quickly change and display my beloved results.
- Sequence Length: 196
- Sequence Number of predicted TMHs: 0
- Sequence Exp number of AAs in TMHs: 0.01612
- Sequence Exp number, first 60 AAs: 0.00699
- Sequence Total prob of N-in: 0.43226
- Sequence TMHMM2.0 outside 1 196
These results, once again, show that there are no transmembrane helices in my protein. It is in agreement with the ProtScale results, which is good.
HIV Structure Research Project
Do protein structures of a predominant strain differ greatly from the protein structure of viruses that are rapidly diverging? If so, how extensively do they differ and in what ways? What may this mean?
Patients to be Used
- Patients 2 & 13
- Show clear signs of strain predominance
- Two viruses from S2, V3
- Two viruses from S13, V3
- Viruses of these visits for each subject most followed the pattern of predominance
- Patients 3 & 9
- Limited progression among a single branch
- Two viruses from S3, V4
- Two viruses from S9, V5
- Viruses for these subjects from the indicated visits were most spread out among the evolutionary tree