Michael R. Pina Week 8
From OpenWetWare
Jump to navigationJump to search
- I went to the uniport website at http://www.uniprot.org/ and in the search field I entered "gp 120"
- This yielded over 4000 pages of results
- I selected the first result (accession number P04578, secondary accession number O09779), which had an amino acid length of 856
- I found that this contains protein sequences from both gp120 and gp41
- I downloaded the FASTA file
- I went to the Biology workbench site
- I selected nucleic tools
- I selected the nucleotide sequence (285bp long) from Subject 1, Visit 1-1
- I viewed the sequence and copied it to my clipboard
- I navigated to the ORF finder at http://www.ncbi.nlm.nih.gov/projects/gorf/ and pasted my nucleotide sequence into it
- I chose the ORF of +1 because it yielded the most amino acids
- The length of the ORF was 75 aa
- I compared the output results from the ORF finder from the FASTA file retrieved from UniProt
- I selected a string of amino acids (about 15 aa long) and using the "Find" feature, tried to find that same sequence in the FASTA file
- This did not work
- I tried selecting aa sequences from other ORFs and tried to do the same thing but I still got the same results
- I selected a string of amino acids (about 15 aa long) and using the "Find" feature, tried to find that same sequence in the FASTA file
- I used the other ORF finder posted in the wiki
- I found that the output of this website is a little bit easier to read than the other website shown in the For Dummies book
- Many of the ORFs had stop codons but one did not, so I determined that this was probably the correct sequence:
- E V V I R S E N F T N N A K I I I V Q L N E S V E I N C T R P N N N T R K S I H I G P G R A F Y T T G D I I G D I R Q A Y C N I S R A E W D N T L K Q I V I K L R E H F G N K T I V F N H S S
- I figured I was not getting any matching results due to changes in the amino acid sequence
- I imported the S1V1-1 aa sequence in to the biology workbench
- I did the same thing for the FATSA file obtained from UniProt
- I performed a multiple sequence alignment using CLUSTALW
- What I found was the most of the sequence was fully conserved between the two sequences
- There were about about a dozen discrepancies due to amino acid change or deletion/insertion)
- Also of note; there were about a dozen amino acids that were not fully conserved conserved
EVVIRSVNFTDNAKTIIVQLNTSVEINCTRPN EVVIRSENFTNNAKIIIVQLNESVEINCTRPN ****** ***:*** ****** **********
NNTRKRIRIQRGPGRAFVTIG-KIGNMRQAHCNISRAKWNNTLKQIASKL NNTRKSIHI--GPGRAFYTTGDIIGDIRQAYCNISRAEWDNTLKQIVIKL ***** *:* ****** * * **::***:******:*:******. **
REQFGNNKTIIFKQSS REHFG-NKTIVFNHSS **:** ****:*::**
- Where * indicates full conservation and : indicates strong group conservation, but not necessarily full conservation
- The top corresponds to the FASTA file from UniProt
- The bottom strand corresponds to S1V1-1 from the Markham article