Harvard:Biophysics 101/2007/02/01

From OpenWetWare
Jump to navigationJump to search
Biophysics 101: Genomics, Computing, and Economics


Home        People        Schedule        Project        Python        Help       

Post comments/questions to the discussion page (the 2nd tab at the top)

Tasks to complete by Feb 6

  • Register an account on OWW
  • Populate your user page with the following information
    • Name
    • Concentration
    • Affiliation & Year
    • Research/academic interests
    • Optional: anything else you wish to share (e.g. personal interests, photo, links)
  • Link to your userpage from our People page
  • If anyone has problems with the following, the TFs will schedule meetings to provide individual walkthroughs.
  • Download and install Python and BioPython (details here)
  • Once you have Python and BioPython installed, try running the following code
    • This is done by copying and pasting the code into an empty text file and saving it on your machine.
    • The code can be executed in a number of ways (described here)
    • Also, look here and here for more information on getting started
#!/usr/bin/env python

from Bio import GenBank, Seq

# We can create a GenBank object that will parse a raw record
# This facilitates extracting specific information from the sequences
record_parser = GenBank.FeatureParser()

# NCBIDictionary is an interface to Genbank
ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = record_parser)

# If you pass NCBIDictionary a GenBank id, it will download that record
parsed_record = ncbi_dict['116496646']

print "GenBank id:", parsed_record.id

# Extract the sequence from the parsed_record
s = parsed_record.seq.tostring()
print "total sequence length:", len(s)

max_repeat = 9

print "method 1"
for i in range(max_repeat):
    substr = ''.join(['A' for n in range(i+1)])
    print substr, s.count(substr)

print "\nmethod 2"
for i in range(max_repeat):
    substr = ''.join(['A' for n in range(i+1)])
    count = 0
    pos = s.find(substr,0)
    while not pos == -1:
        count = count + 1
        pos = s.find(substr,pos+1)
    print substr, count
  • Modify the code to do the following things
    • Process a different GenBank ID of your choosing
    • Tally stretches of poly-T instead of poly-A
    • Print the translated protein sequence (hint) and its length
    • Create a new NCBIDictionary without a parser and use that to print the a raw record (hint)
  • Paste your code onto your user page when you are complete.