Michael R. Pina Week 8


 * 1) I went to the uniport website at http://www.uniprot.org/ and in the search field I entered "gp 120"
 * 2) *This yielded over 4000 pages of results
 * 3) *I selected the first result (accession number P04578, secondary accession number O09779), which had an amino acid length of 856
 * 4) *I found that this contains protein sequences from both gp120 and gp41
 * 5) *I downloaded the FASTA file
 * 6) I went to the Biology workbench site
 * 7) *I selected nucleic tools
 * 8) *I selected the nucleotide sequence (285bp long) from Subject 1, Visit 1-1
 * 9) *I viewed the sequence and copied it to my clipboard
 * 10) I navigated to the ORF finder at http://www.ncbi.nlm.nih.gov/projects/gorf/ and pasted my nucleotide sequence into it
 * 11) *I chose the ORF of +1 because it yielded the most amino acids
 * 12) *The length of the ORF was 75 aa
 * 13) I compared the output results from the ORF finder from the FASTA file retrieved from UniProt
 * 14) *I selected a string of amino acids (about 15 aa long) and using the "Find" feature, tried to find that same sequence in the FASTA file
 * 15) **This did not work
 * 16) *I tried selecting aa sequences from other ORFs and tried to do the same thing but I still got the same results
 * 17) I used the other ORF finder posted in the wiki
 * 18) *I found that the output of this website is a little bit easier to read than the other website shown in the For Dummies book
 * 19) *Many of the ORFs had stop codons but one did not, so I determined that this was probably the correct sequence:
 * 20) * E V V I R S E N F T N N A K I I I V Q L N E S V E I N C T R P N N N T R K S I H I G P G R A F Y T T G D I I G D I R Q A Y C N I S R A E W D N T L K Q I V I K L R E H F G N K T I V F N H S S
 * 21) I figured I was not getting any matching results due to changes in the amino acid sequence
 * 22) *I imported the S1V1-1 aa sequence in to the biology workbench
 * 23) *I did the same thing for the FATSA file obtained from UniProt
 * 24) *I performed a multiple sequence alignment using CLUSTALW
 * 25) What I found was the most of the sequence was fully conserved between the two sequences
 * 26) *There were about about a dozen discrepancies due to amino acid change or deletion/insertion)
 * 27) *Also of note; there were about a dozen amino acids that were not fully conserved conserved

EVVIRSVNFTDNAKTIIVQLNTSVEINCTRPN EVVIRSENFTNNAKIIIVQLNESVEINCTRPN ****** ***:*** ****** **********

NNTRKRIRIQRGPGRAFVTIG-KIGNMRQAHCNISRAKWNNTLKQIASKL NNTRKSIHI--GPGRAFYTTGDIIGDIRQAYCNISRAEWDNTLKQIVIKL ***** *:* ****** * *  **::***:******:*:******. **

REQFGNNKTIIFKQSS REHFG-NKTIVFNHSS **:** ****:*::**


 * Where * indicates full conservation and : indicates strong group conservation, but not necessarily full conservation
 * The top corresponds to the FASTA file from UniProt
 * The bottom strand corresponds to S1V1-1 from the Markham article

[[Media:P04578.fasta.txt| FATSA file]]