Michael R. Pina Week 8

  1. I went to the uniport website at http://www.uniprot.org/ and in the search field I entered "gp 120"
    • This yielded over 4000 pages of results
    • I selected the first result (accession number P04578, secondary accession number O09779), which had an amino acid length of 856
    • I found that this contains protein sequences from both gp120 and gp41
    • I downloaded the FASTA file
  2. I went to the Biology workbench site
    • I selected nucleic tools
    • I selected the nucleotide sequence (285bp long) from Subject 1, Visit 1-1
    • I viewed the sequence and copied it to my clipboard
  3. I navigated to the ORF finder at http://www.ncbi.nlm.nih.gov/projects/gorf/ and pasted my nucleotide sequence into it
    • I chose the ORF of +1 because it yielded the most amino acids
    • The length of the ORF was 75 aa
  4. I compared the output results from the ORF finder from the FASTA file retrieved from UniProt
    • I selected a string of amino acids (about 15 aa long) and using the "Find" feature, tried to find that same sequence in the FASTA file
      • This did not work
    • I tried selecting aa sequences from other ORFs and tried to do the same thing but I still got the same results
  5. I used the other ORF finder posted in the wiki
    • I found that the output of this website is a little bit easier to read than the other website shown in the For Dummies book
    • Many of the ORFs had stop codons but one did not, so I determined that this was probably the correct sequence:
    • E V V I R S E N F T N N A K I I I V Q L N E S V E I N C T R P N N N T R K S I H I G P G R A F Y T T G D I I G D I R Q A Y C N I S R A E W D N T L K Q I V I K L R E H F G N K T I V F N H S S
  6. I figured I was not getting any matching results due to changes in the amino acid sequence
    • I imported the S1V1-1 aa sequence in to the biology workbench
    • I did the same thing for the FATSA file obtained from UniProt
    • I performed a multiple sequence alignment using CLUSTALW
  7. What I found was the most of the sequence was fully conserved between the two sequences
    • There were about about a dozen discrepancies due to amino acid change or deletion/insertion)
    • Also of note; there were about a dozen amino acids that were not fully conserved conserved
****** ***:*** ****** **********
***** *:*  ****** * *  **::***:******:*:******. **
**:** ****:*::**
  • Where * indicates full conservation and : indicates strong group conservation, but not necessarily full conservation
  • The top corresponds to the FASTA file from UniProt
  • The bottom strand corresponds to S1V1-1 from the Markham article

FATSA file

