Harvard:Biophysics 101/2007/Notebook:Resmi Charalel/2007-2-6

From OpenWetWare
Jump to navigationJump to search

Assignment 1, due 2/6/07

Code for Assignment (This code performs all requested changes to original code) and Output of Code:

#!/usr/bin/env python

from Bio import GenBank, Seq
from Bio.Seq import Seq,translate

# We can create a GenBank object that will parse a raw record
# This facilitates extracting specific information from the sequences
record_parser = GenBank.FeatureParser()

# NCBIDictionary is an interface to Genbank
ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = record_parser)

# If you pass NCBIDictionary a GenBank id, it will download that record
parsed_record = ncbi_dict['6273291']#Opuntia marenae rp16 gene; partial intron sequence

print "GenBank id:", parsed_record.id

# Extract the sequence from the parsed_record
s = parsed_record.seq.tostring()
print "total sequence length:", len(s)

max_repeat = 9

print "method 1"
for i in range(max_repeat):
    substr = .join(['T' for n in range(i+1)])
    print substr, s.count(substr)

print "\nmethod 2"
for i in range(max_repeat):
    substr = .join(['T' for n in range(i+1)])
    count = 0
    pos = s.find(substr,0)
    while not pos == -1:
        count = count + 1
        pos = s.find(substr,pos+1)
    print substr, count

start = s.find('ATG')
orf = 
c=start

for x in range(len(s)-start-4):
    orf = orf + s[c]
    c= c +1
    length = c-start
    remainder=length%3
    if remainder == 0:
        codon=s[c]+s[c+1]+s[c+2]
        if codon== 'TAA' or codon=='TAG' or codon=='TGA':
            orf=orf+s[c+1]+s[c+2]
            break

protein = translate(orf)

print 'protein sequence: ', protein
print 'protein length: ', len(protein)

rawdict = GenBank.NCBIDictionary('nucleotide', 'genbank')
rawrec = rawdict['6273291']
print "raw record: ", rawrec



Output of Code: GenBank id: AF191665.1 total sequence length: 902 method 1 T 279 TT 68 TTT 14 TTTT 7 TTTTT 4 TTTTTT 2 TTTTTTT 0 TTTTTTTT 0 TTTTTTTTT 0 method 2 T 279 TT 84 TTT 25 TTTT 13 TTTTT 6 TTTTTT 2 TTTTTTT 0 TTTTTTTT 0 TTTTTTTTT 0 protein sequence: MRINGKAKERKK protein length: 12 raw record: LOCUS AF191665 902 bp DNA linear PLN 07-NOV-1999 DEFINITION Opuntia marenae rpl16 gene; chloroplast gene for chloroplast product, partial intron sequence. ACCESSION AF191665 VERSION AF191665.1 GI:6273291 KEYWORDS . SOURCE chloroplast Opuntia marenae ORGANISM Opuntia marenae Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; Caryophyllales; Cactaceae; Opuntioideae; Opuntia. REFERENCE 1 (bases 1 to 902) AUTHORS Dickie,S.L. and Wallace,R.S. TITLE Phylogeny of the subfamily Opuntioideae (Cactaceae) JOURNAL Unpublished REFERENCE 2 (bases 1 to 902) AUTHORS Dickie,S.L. and Wallace,R.S. TITLE Direct Submission JOURNAL Submitted (28-SEP-1999) Botany, Iowa State University, 353 Bessey Hall, Ames, IA 50011-1020, USA FEATURES Location/Qualifiers source 1..902 /organism="Opuntia marenae" /organelle="plastid:chloroplast" /mol_type="genomic DNA" /db_xref="taxon:106980" /note="subfamily Opuntioideae; synonym: Marenopuntia marenae, Grusonia marenae" gene <1..>902 /gene="rpl16" intron <1..>902 /gene="rpl16" ORIGIN 1 tatacattaa aggaggggga tgcggataaa tggaaaggcg aaagaaagaa aaaaatgaat 61 ctaaatgata taggattcca ctatgtaagg tctttgaatc atatcataaa agacaatgta 121 ataaagcatg aatacagatt cacacataat tatctgatat gaatctattc atagaaaaaa 181 gaaaaaagta agagcctccg gccaataaag actaagaggg ttggctcaag aacaaagttc 241 attaagagct ccattgtaga attcagacct aatcattaat caagaagcga tgggaacgat 301 gtaatccatg aatacagaag attcaattga aaaagatcct atgntcattg gaaggatggc 361 ggaacgaacc agagaccaat tcatctattc tgaaaagtga taaactaatc ctataaaact 421 aaaatagata ttgaaagagt aaatattcgc ccgcgaaaat tcctttttta ttaaattgct 481 catattttct tttagcaatg caatctaata aaatatatct atacaaaaaa acatagacaa 541 actatatata tatatatata taatatattt caaattccct tatatatcca aatataaaaa 601 tatctaataa attagatgaa tatcaaagaa tctattgatt tagtgtatta ttaaatgtat 661 atattaattc aatattatta ttctattcat ttttattcat tttcaaattt ataatatatt 721 aatctatata ttaatttaga attctattct aattcgaatt caatttttaa atattcatat 781 tcaattaaaa ttgaaatttt ttcattcgcg aggagccgga tgagaagaaa ctctcatgtc 841 cggttctgta gtagagatgg aattaagaaa aaaccatcaa ctataacccc aaaagaacca 901 ga //