User:Justinhlo

DNA Melting Project: IAP 2007

 * There are three main areas of experimentation listed here, but I think that some will yield more interesting results than others. Maybe test all three, but only write about the 2 that would be of most interest to others hoping to implement this module?
 * Also, perhaps make an estimate of the costs for the materials, such as in the AFM paper? Only if it's an appealing price, of course ..
 * I will record not only set-ups, but also an approximate timeline - how many hours must be spent at each stage, etc.; maybe highlight spots where doing things precisely is very important. It might be good to give an estimate for how long the module would take?
 * While MATLAB/Python code probably won't be included in the actual paper, I've seen papers that say "if you want the code, just e-mail the author" or something like that - maybe that would be a good idea in case teachers wish to provide analysis hints to students?


 * There are not many things that have to be changed in the set-up, but there may be some ways to reduce the need for post-collection processing. For instance, the Wheatstone bridge had a very bumpy output more due to limitations of incoming voltage sensitivity than noise.  So maybe making the Wheatstone bridge have greater dV/dT behavior might alleviate this?  [Easiest way to increase the resolution is to restrict the input range in LabVIEW, the way we did for the PMT lab.  Increasing the swing of the Wheatstone output voltage is also possible, but takes more doing for a smaller improvement. -MS] I need to review the settings on that to see if there is room for improvement.  The fluorescence signal itself seemed to be roughly noise-limited instead, so the gain on the op-amp may be okay.  Maybe some tweaks to the low-pass filter may be desirable, though.

Graphs
Here's a 19-bp perfect match melting curve. The empirical melting temperature is something like 52 C, while the consensus for several methods I modeled is around 52.5 C. I want to see if this result is repeatable, though.



Ionic Strength
Using either NaCl or KCl (is there an advantage to one over the other? Does Na+ or K+ interact more with O-?). Original 19-bp sequence is fine ..

The module originally used 33uM as the concentration. I do not know why this was used - it seems rather high. Perhaps this was meant so that the fluorescence signal would be larger than background noise?  [Yes, I'm pretty sure this is the case. -MS ]

Goal: show how the set-up can be used to investigate the significant effects of ionic particle concentrations on DNA melting parameters.

Control and Experimental Groups: 0 mM (control) 1 mM 10 mM (original module condition) 150 mM (physiological conditions, see http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2703503&dopt=Abstract) 1000 mM (SantaLucia paper’s conditions – I’m curious to see if we get similar results).

These conditions form an approximately logarithmic distribution between 0 M and 1 M. This is justified because the projected dependence of ΔS and ΔH on ionic concentration is also logarithmic. The 150 mM may be replaced by 100 mM if it is deemed more desirable.

It may be worth looking into the 50 mM concentration, because this is what PCRs are run at (50 mM KCl, specifically), and it is also what many of the models have actually been designed for.

Mismatches
C, T are pyrimidines (small) A, G are purines (large)

The original 19-bp sequences that we tested were: 19bp perfect match: 5'- ATCAA GCAGC CATGC AAAT -3' 3'- TAGTT CGTCG GTACG TTTA -5'

19bp single-base mismatch (SNP) [T-C mismatch]: 5'- ATCAA GCATC CATGC AAAT -3' 3'- TAGTT CGTCG GTACG TTTA -5' ^

The mismatch is a pyrimidine-pyrimidine type. I would be curious to compare that to an SNP of purine-purine type would increase the differences, so for instance:

19bp single-base mismatch (SNP) [A-G mismatch]: 5'- ATCAA GCAGC CATGC AAAT -3' 3'- TAGTT CGTAG GTACG TTTA -5' ^

The original sequence has 8 G-C pairs, or 42%, which is in the middle of what seems to be the normal 20-60% range (for proks, anyway .. http://insilico.ehu.es/oligoweb/index2.php?m=all). I think this is a good representative sequence, and it may be redundant to bother with different G-C contents, since it's pretty well-established that the more G-C, the higher the melting temperature (presumably due to hydrogen bonding effects?). One thing I do wonder about is the relationship of G-C content to ionic strength sensitivity of the Tm values, since the triple H-bond format could interact differently with Na/K as compared to the 2-H-bond format. Anyway, that isn't that important, probably.

Between the 1-mismatch and many-mismatch cases, it would be nice to have a somewhat-mismatched case that could simulate the actual relationship between different alleles at homologous chromosomes' loci or a random construct trying to integrate into DNA.

5'- ATCAA GCAGC CATGC AAAT -3' 3'- TACTC CGTCT GAACG TCTA -5' ^ ^    ^  ^     ^

The original total mismatch case used the sequence: 19bp total mismatch 5'- ATCAA GCAGC CATGC AAA -3' 3'- TATTC TGTTC CTGGT TTCC -5' ^^^ ^^ ^^ ^ ^^^  ^^

I do not know if the top line is a typo or not .. it doesn't actually have 19 bp ... [These are the sequences that were shipped, and I was also surprised that this one was an 18bp, but that seems to be accurate -- we can order an actual 19bp for our experiments. -MS ]

This is not really a "total mismatch," but even if it hybridizes, (at the ends), it will probably always show up as ssDNA due to the behavior of the intercalating dye. Is it significant that the ends are where there is a pair of matching bases on either side? Why are all the matching bases A/T? Could this lead to self-annealing?

Length
Most models for calculating the Tm values of DNA are only designed to be accurate for relatively short strands of DNA, ~15-50 bp. The Wallace formula, which has T proportional to length, obviously cannot hold up for long strands of DNA. The GC methods max out at around 80-some degrees C, which is much more realistic. However, for long strands of DNA, most people are not interested in a melting temperature, since all DNA melts at 90 degrees C. Thus, the region of interest corresponds to the region in which people design primers and other short segments of DNA.

The original lengths of 19 and 40 are okay, but it would be instructive if other lengths such as 30 or 50 were included in order to extract more H and S values and thus help with N-N validation analysis.

 [Sure, how about our full series being 20, 30, 40, 50bp? Or, if we want five data points, 18, 26, 34, 42, 50? -MS ]

Here are sequences that could be used for the 20,30,40,50 series. I tried make them have similar GC content, and I checked for hairpins/loops by eye (using the computer-generated reverse complements). The Tm estimates should really be taken with a grain of salt.

20 bp

5' ATCAA GCAGC CATCG AAACT 3' 45% GC. Tm = 51.4 C 3' TAGTT CGTCG GTAGC TTTGA 5'

rev comp AGTTT CGATG GCTGC TTGAT

30 bp

5' ATCAA GCAGC CATCG AAACT TAGGC TTACA 3' 43.3% GC. Tm = 62.2 C 3' TAGTT CGTCG GTAGC TTTGA ATCCG AATGT 5'

rev comp TGTAA GCCTA AGTTT CGATG GCTGC TTGAT

40 bp

5' ATCAA GCAGC CATCG AAACT TAGGC TTACA ACCAG TGACT 3' 45% GC. Tm = 56.7 C 3' TAGTT CGTCG GTAGC TTTGA ATCCG AATGT TGGTC ACTGA 5'

rev comp AGTCACTGGTTGTAAGCCTAAGTTTCGATGGCTGCTTGAT

50 bp

5' ATCAA GCAGC CATCG AAACT TAGGC TTACA ACCAG TGACT ACAGA TTGCA 3' 44% GC. Tm = 58.9 C 3' TAGTT CGTCG GTAGC TTTGA ATCCG AATGT TGGTC ACTGA TGTCT AACGT 5'

rev comp TGCAATCTGTAGTCACTGGTTGTAAGCCTAAGTTTCGATGGCTGCTTGAT

Potential Models for DNA modeling
This interesting paper actually lists a good number of methods (including the one we used in the module). http://bioinformatics.oxfordjournals.org/cgi/content/full/21/6/711. However, while it compares the methods thoroughly, it does not run the empirical experiments in order to see which one is actually right.

Here are a few models worth investigating:
 * 1) The very basic equation used for very short sequences:
 * 2) *Tm= (wA+xT) * 2 + (yG+zC) * 4.
 * 3) *Probably only interesting for length <20, or else the temperatures predicted are quite bad.
 * 4) The standard G-C content equation:
 * 5) *Tm= 64.9 +41*(yG+zC-16.4)/(wA+xT+yG+zC).
 * 6) The modified G-C content equation, with more length-dependent consideration:
 * 7) *Tm = 100.5 + 41*(yG+zC-36.4)/(wA+xT+yG+zC) + 16.6 log([Na])
 * 8) *This is the same idea as the previous one unless the length changes. So unless we are comparing length, this will be roughly the same result as above
 * 9) The Wetmur variant of the G-C method includes a % mismatch term:
 * 10) *(see http://www-nmr.cabm.rutgers.edu/bioinformatics/cogs/Tm_predict.html)
 * 11) *Tm = 81.5 + [(16.6)(log{[Na+]/(1.0+0.7([Na+]))}]+[0.41(%GC)] - 500/(wA+xT+yG+zC) - P
 * 12) *P = % mismatch (unclear if this is used as a direct number or as a decimal).
 * 13) The NN model (with H and S ..):
 * 14) *Tm = H/(S - R ln(4/concentration of DNA))
 * 15) *It seems that the last one here is probably the only one that actually tries to include entropic and enthalpic conditions. That means we have limited options for the fitting of the actual curve.
 * 16) *This one is interesting in all three cases, as the ionic stuff affects entropy, the mismatching affects enthalpy (and entropy, I guess), and length affects both.
 * 17) *Maybe there are other equations for extracting the H and S from the curve. Will look into this.

While these models do include salt-correction terms and sequence-based terms, I have not seen any adjustments for mismatches (correction: found the Wetmur one above). Perhaps we could derive an approximate "mismatch cost" term? The first mismatch is the most costly, and after that, the subsequent temperature drop per mismatch is less until some point where there are so many mismatches that no hybridization is possible.

Here is a very basic python program for using these four methods. I have not yet included any considerations for salt and mismatches.


 * 1) Justin Lo
 * 2) January 8, 2007
 * 3) Code for predicting the Tm values for an arbitrary oligo sequence

from __future__ import division from NN import NNEnergy import math

done = False; while(done is False): seq = raw_input("Please type in your sequence >>> ").strip; seq = seq.upper; ## convert to uniform state seq = seq.replace(' ',''); ## get rid of any spaces wxyz = [seq.count("A"),seq.count("C"),seq.count("G"),seq.count("T")] TmOne = (wxyz[0]+wxyz[3])*2 + (wxyz[1]+wxyz[2])*4; TmTwo = 64.9 + 41*(wxyz[1]+wxyz[2]-16.4)/(sum(wxyz)); salt = 50; ## salt concentration in mM   TmThree = 100.5 + 41*(wxyz[1]+wxyz[2]-36.4)/(sum(wxyz))+16.6*math.log10(salt/1000); HS = NNEnergy((seq),salt/1000); conc = 50e-6; TmFour = (HS[0]*1000)/(HS[1]-1.987*math.log(4/conc))-273.15;

TmFive = 81.5 + (16.6)*math.log10((salt/1000)/(1+0.7*salt/1000))+41*(wxyz[1]+wxyz[2])/sum(wxyz)-500/sum(wxyz) sym = ' oC' print 'Wallace Method: '+str(TmOne)+sym print 'Basic %GC Method: '+str(TmTwo)+sym print '%GC Method: '+str(TmThree)+sym print 'NN Method: '+str(TmFour)+sym print 'Wetmur %GC Method: '+str(TmFive)+sym query = raw_input("Do you want to try another sequence? y/n >>> "); if(query == "n"): done = True;