Harvard:Biophysics 101/2007/02/01

Biophysics 101: Genomics, Computing, and Economics

Home People Schedule Project Python Help

Post comments/questions to the discussion page (the 2nd tab at the top)

Tasks to complete by Feb 6

Register an account on OWW
- See this page for instructions
Populate your user page with the following information
- Name
- Concentration
- Affiliation & Year
- Research/academic interests
- Optional: anything else you wish to share (e.g. personal interests, photo, links)
Link to your userpage from our People page
If anyone has problems with the following, the TFs will schedule meetings to provide individual walkthroughs.
Download and install Python and BioPython (details here)
Once you have Python and BioPython installed, try running the following code
- This is done by copying and pasting the code into an empty text file and saving it on your machine.
- The code can be executed in a number of ways (described here)
- Also, look here and here for more information on getting started

#!/usr/bin/env python

from Bio import GenBank, Seq

# We can create a GenBank object that will parse a raw record
# This facilitates extracting specific information from the sequences
record_parser = GenBank.FeatureParser()

# NCBIDictionary is an interface to Genbank
ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = record_parser)

# If you pass NCBIDictionary a GenBank id, it will download that record
parsed_record = ncbi_dict['116496646']

print "GenBank id:", parsed_record.id

# Extract the sequence from the parsed_record
s = parsed_record.seq.tostring()
print "total sequence length:", len(s)

max_repeat = 9

print "method 1"
for i in range(max_repeat):
    substr = ''.join(['A' for n in range(i+1)])
    print substr, s.count(substr)

print "\nmethod 2"
for i in range(max_repeat):
    substr = ''.join(['A' for n in range(i+1)])
    count = 0
    pos = s.find(substr,0)
    while not pos == -1:
        count = count + 1
        pos = s.find(substr,pos+1)
    print substr, count

Modify the code to do the following things
- Process a different GenBank ID of your choosing
- Tally stretches of poly-T instead of poly-A
- Print the translated protein sequence (hint) and its length
- Create a new NCBIDictionary without a parser and use that to print the a raw record (hint)
Paste your code onto your user page when you are complete.

Harvard:Biophysics 101/2007/02/01

Tasks to complete by Feb 6

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools