Harvard:Biophysics 101/2007/Notebook:Xiaodi Wu/2007-3-15
Input sequence:
>example1 CACCCTCGCCAGTTACGAGCTGCCGAGCCGCTTCCTAGGCTCTCTGCGAATACGGACACG CATGCCACCCACAACAACTTTTTAAAAGAATCAGACGTGTGAAGGATTCTATTCGAATTA CTTCTGCTCTCTGCTTTTATCACTTCACTGTGGGTCTGGGCGCGGGCTTTCTGCCAGCTC CGCGGACGCTGCCTTCGTCCAGCCGCAGAGGCCCCGCGGTCAGGGTCCCGCGTGCGGGGT ACCGGGGGCAGAACCAGCGCGTGACCGGGGTCCGCGGTGCCGCAACGCCCCGGGTCTGCG CAGAGGCCCCTGCAGTCCCTGCCCGGCCCAGTCCGAGCTTCCCGGGCGGGCCCCCAGTCC GGCGATTTGCAGGAACTTTCCCCGGCGCTCCCACGCGAAGC
First step: Align this sequence on NCBI Blast to get info on where the sequence is or might be. The single result is shown below.
>ref|NT_030059.12|Hs10_30314 Homo sapiens chromosome 10 genomic contig, reference assembly Length=44617998 Features flanking this part of subject sequence: 3895 bp at 5' side: hypothetical protein 425 bp at 3' side: HtrA serine peptidase 1 Score = 736 bits (398), Expect = 0.0 Identities = 400/401 (99%), Gaps = 0/401 (0%) Strand=Plus/Plus Query 1 CACCCTCGCCAGTTACGAGCTGCCGAGCCGCTTCCTAGGCTCTCTGCGAATACGGACACG 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 42968870 CACCCTCGCCAGTTACGAGCTGCCGAGCCGCTTCCTAGGCTCTCTGCGAATACGGACACG 42968929 Query 61 CATGCCACCCACAACAACTTTTTAAAAGAATCAGACGTGTGAAGGATTCTATTCGAATTA 120 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 42968930 CATGCCACCCACAACAACTTTTTAAAAGAATCAGACGTGTGAAGGATTCTATTCGAATTA 42968989 Query 121 CTTCTGCTCTCTGCTTTTATCACTTCACTGTGGGTCTGGGCGCGGGCTTTCTGCCAGCTC 180 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 42968990 CTTCTGCTCTCTGCTTTTATCACTTCACTGTGGGTCTGGGCGCGGGCTTTCTGCCAGCTC 42969049 Query 181 CGCGGACGCTGCCTTCGTCCAGCCGCAGAGGCCCCGCGGTCAGGGTCCCGCGTGCGGGGT 240 |||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Sbjct 42969050 CGCGGACGCTGCCTTCGTCCGGCCGCAGAGGCCCCGCGGTCAGGGTCCCGCGTGCGGGGT 42969109 Query 241 ACCGGGGGCAGAACCAGCGCGTGACCGGGGTCCGCGGTGCCGCAACGCCCCGGGTCTGCG 300 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 42969110 ACCGGGGGCAGAACCAGCGCGTGACCGGGGTCCGCGGTGCCGCAACGCCCCGGGTCTGCG 42969169 Query 301 CAGAGGCCCCTGCAGTCCCTGCCCGGCCCAGTCCGAGCTTCCCGGGCGGGCCCCCAGTCC 360 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 42969170 CAGAGGCCCCTGCAGTCCCTGCCCGGCCCAGTCCGAGCTTCCCGGGCGGGCCCCCAGTCC 42969229 Query 361 GGCGATTTGCAGGAACTTTCCCCGGCGCTCCCACGCGAAGC 401 ||||||||||||||||||||||||||||||||||||||||| Sbjct 42969230 GGCGATTTGCAGGAACTTTCCCCGGCGCTCCCACGCGAAGC 42969270
Conclusion -- this sequence we have obtained is on chromosome 10, and there is one SNP apparent.
Second step: Examine the genome browser to find where exactly this is to help formulate the query in the next step. Result: 10q25, bases 124210300 to 124210800.
Third step: Look up SNPs at Entrez SNP based on genome browser information, to find the relevant SNPs and compare to the sequence we currently have (query: "10[CHR] AND 124210300:124210800[CHRPOS]") Two exist:
1: rs11200638 [Homo sapiens] AGCTCCGCGGACGCTGCCTTCGTCC[A/G]GCCGCAGAGGCCCCGCGGTCAGGGT 2: rs2672598 [Homo sapiens] CGCCGGACTGGGGGCCCGCCCGGGA[A/G]GCTCGGACTGGGCCGGGCAGGGACT
Fourth step: For the first SNP, we have A where the reference sequence has G. For the second SNP (which is on the minus strand), we have T as the complement, in agreement with the reference. Search OMIM to find out more about these loci and the implications of having these alleles (queries: "rs11200638" and "rs2672598")
Conclusions -- Homozygosity for the A allele in the case of the first SNP results in a tenfold increased risk of wet (neovascular) age-related macular degeneration. No information is available for the other locus.
Last step and physician's advice: From OMIM, it can be found that age-related macular degeneration is a disorder that arises due to many factors, genetics being only one. If this individual is heterozygous for the allele, then the risk to him or her of this disease is not appreciable, but having children homozygous for the allele would obviously be the remaining concern. If the individual is homozygous, then this higher genetic risk suggests that environmental risk factors should be avoided if possible; these include avoiding smoking, and monitoring diet and cholesterol levels.
Part II
(Mystery sequence)
>Example by Xiaodi ACCTGGACCCCTGTGCCTTGTATGCATCTGAAGAGGAGATCGGGCAGTTGGTGAAGCAGATGCTGGATGA CTTTGGACCACATCGCTACATTGCCAACCTGGGCCATGGGCTTTATCCTGACATGGACCCAGAACATGTG GGCGCCTTTGTGGATGCTGTGCATAAACACTCACGTCTGCTTCGACAGAACTGAGTGTATACCTTTACCC TCAAGTACCACTAACACAGATGATTGATCGTTTCCAGGACAATAAAAGTTTCGGAGTTGAACTATTGTGT AGTTTTGTTTGTGAAAGATTGTGCCCATATCCTCAGTTCTTCTTAGCCTCTGCTCCTTCCCTGGGAACCC