BE.180:Assignment4

BE.180 Homework Assignment #4 - Due on Tuesday, April 25 at 5pm


 * Download Assignment 4: [[Media:PSETfour_final.pdf]]
 * You will need these three input files:
 * Dictionary with the genetic code: [[Media:GeneticCode.dict]]
 * Text file with protein sequence: [[Media:Protein.txt]]
 * Text file with genome: [[Media:NC_003418.txt]]
 * When saving these files, make sure the file names are uppercase.

Submit your one Python file to: be180hw@gmail.com. Please do not send anything other than this file to the gmail account (all questions can be sent to spencers@mit.edu or sontag@mit.edu). Please note that we will be posting the solutions to this assignment at 5pm on April 25, and we will not accept any homework after that time.

Corrections to problem set

 * The sentence on page one should read "If the first positions of all codons encoding an amino acid are the same, write that nucleotide’s letter to the mRNA string (i.e., A, C, U, G)", not (A, C, T, G)!
 * From a student: "I was wondering if you could help me with the decoding rules for problem 1. I don't know if I'm interpreting them wrong, but why does it seem like the rules don't cover all possible cases? Such as for arginine, which can only be A/C in the first position, and serine, which can only be C/G in the second position.  Are we supposed to assign these nucleotides with H and N respectively, even if substituting U in arginine and A/U in serine will result in a different amino acid codon?"
 * Yes, there was an oversight in the rule-making. However, please still follow the rules outlined in the pdf for the purposes of this assignment.  We will not test your code with a protein containing arginine or serine.

Solutions
[[Media:spencers_4.txt]]


 * The code was graded using a Protein.txt input file containing the protein "LAND" and a NC_003418.txt file containing the genome "AAAACCCCGGGGTTT".
 * The correct outputs are RT=YUNGCNAAYGAY, bestScore=0, listOfBestMers= ['GTTTAAAA', 'TTTTAAAA', 'GTTTTAAA', 'TTTTTAAA'].
 * Point breakdown: Q1: 25 points, Q2a: 10 points, Q2b: 15 points, Q2c: 25 points, Q3: 25 points
 * Sample point deductions:
 * -5 for RT being incorrect, but close
 * -5 for not getting the 4 mers in listOfBestMers but getting some of them, or for finding a few too many, with the correct ones being part of the list
 * -10 for being even further off on the listOfBestMers, for example finding 1120 instead of 4, or not finding any that are correct
 * -5 for getting bestScore wrong
 * -5 total for naming variables incorrectly
 * If you have specific questions, please email Sabrina