Harvard:Biophysics 101/2007/02/01

Post comments/questions to the discussion page (the 2nd tab at the top)

Tasks to complete by Feb 6

 * Register an account on OWW
 * See this page for instructions
 * Populate your user page with the following information
 * Name
 * Concentration
 * Affiliation & Year
 * Research/academic interests
 * Optional: anything else you wish to share (e.g. personal interests, photo, links)
 * Link to your userpage from our People page
 * If anyone has problems with the following, the TFs will schedule meetings to provide individual walkthroughs.
 * Download and install Python and BioPython (details here)
 * Once you have Python and BioPython installed, try running the following code
 * This is done by copying and pasting the code into an empty text file and saving it on your machine.
 * The code can be executed in a number of ways (described here)
 * Also, look here and here for more information on getting started
 * 1) !/usr/bin/env python

from Bio import GenBank, Seq

record_parser = GenBank.FeatureParser
 * 1) We can create a GenBank object that will parse a raw record
 * 2) This facilitates extracting specific information from the sequences

ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = record_parser)
 * 1) NCBIDictionary is an interface to Genbank

parsed_record = ncbi_dict['116496646']
 * 1) If you pass NCBIDictionary a GenBank id, it will download that record

print "GenBank id:", parsed_record.id

s = parsed_record.seq.tostring print "total sequence length:", len(s)
 * 1) Extract the sequence from the parsed_record

max_repeat = 9

print "method 1" for i in range(max_repeat): substr = ''.join(['A' for n in range(i+1)]) print substr, s.count(substr)

print "\nmethod 2" for i in range(max_repeat): substr = ''.join(['A' for n in range(i+1)]) count = 0 pos = s.find(substr,0) while not pos == -1: count = count + 1 pos = s.find(substr,pos+1) print substr, count
 * Modify the code to do the following things
 * Process a different GenBank ID of your choosing
 * Tally stretches of poly-T instead of poly-A
 * Print the translated protein sequence (hint) and its length
 * Create a new NCBIDictionary without a parser and use that to print the a raw record (hint)
 * Paste your code onto your user page when you are complete.