Harvard:Biophysics 101/2007/Notebook:Christopher Nabel/2007-2-20

Background
I have successfully installed Clustal (thanks to the help of this assignment's discussion page). I wrote a somewhat brief outline of how I might go about comparing conserved sequences. At this point, though, I'm not entirely sure what would be the best marker for comparisons. This all depends on your specific interests, and mine are too varied at this point to settle in on one specific one. That said, I assembled some crude code that sets ApoE as the reference and allows you to assemble a list of sequences for comparison based on GenBank id numbers. The rest of the code will come soon as I decide what I'd like to do...

Preliminary Code

 * 1) !/usr/bin/env python


 * 1) PART 1: import the GenBank tools necessary to complete this analysis

from Bio import GenBank, Seq from Bio import Clustalw
 * 1) To import the GenBank and Clustal Modules

seq_parser = Genbank.FeatureParser
 * 1) To parse sequence data from the GenBank Entry

ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = seq_parser)
 * 1) To interface to Genbank


 * 1) PART 2: load our ApoE as a reference string for sequence comparison

parsed_ref = ncbi_dict['APOE GENBANK #']
 * 1) Download ApoE's record

ref = parsed_ref.seq.tostring print "ApoE has been loaded as the reference gene"
 * 1) Extract the sequence and save as a string


 * 1) PART 3: Input other sequences for comparison

x = int(raw_input("How many sequences would you like to upload for comparison? ")) comparison_seqs = [] for i in range(0,x): new_id = int(raw_input("Please enter the GenBank ID for a sequence of interest ")) parsed_entry = ncbi_dict[new_id] entry_seq = parsed_entry.seq.tostring comparison_seqs.append(entry_seq)
 * 1) How many sequences to import?
 * 1) Now import them into a list...


 * 1) PART 4: Check for sequence alignments: what specifically should we be looking for?


 * 1) PART 5: Print the results