User:R. Eric Collins/Goldschmidt
Goal: Find out what the Mystery Sequence is and to which metabolic pathway it belongs
- Go to the JGI website. The Joint Genome Institute is responsible for a lot of sequencing of environmental microbes, especially those relating to energy, climate, geobiology, etc.
- click on Find Genes --> BLAST
- Enter the mystery sequence into the text box
- Change the Program to 'blastx'. This will translate the DNA sequence into all 6 possible amino acid sequences and search for matches between them and a database of protein sequences from all available complete microbial genome sequences
- Click 'Run Blast'
- When the program has finished running, scroll down to the alignments showing the best matches.
- How good is the hit? The Expect value (e-value) tells you about how probable it is that the hit is due to chance. For amino acid sequences, anything smaller than 1e-5 is considered a pretty good match. For nucleotide sequences, a more conservative 1e-10 may be used. For finding homologs (genes that share a common ancestor) or orthologs (genes that have diverged through speciation) it may be necessary to use more sophisticated methods of calling hits, because paralogs (genes that have duplicated within a genome) or convergent evolution can complicate matters, and matches to only a shared 'domain' (a 'self-contained' building block out of which proteins are formed) can lead to spurious hits.