CH391L/S13/DNA Computing: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
mNo edit summary
 
(45 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:CH391L_S12]][[Category:CH391L_S13]]
[[Category:CH391L_S12]][[Category:CH391L_S13]]
=Introduction=
=Introduction=
Ancestral Sequence Reconstruction refers to the construction of hypothesized protein or DNA sequences belonging to a common ancestor of extant proteins or DNA. It enables scientists to synthesize biomolecules from extinct organisms. Sequence information (Nucleic Acid and Protein) from extant species can be used to infer the sequences of common ancestor species which can be synthesized and tested in the lab. The method was originally discussed by Pauling and Zuckerkandl in 1963<cite>Pauling</cite>, almost 30 years before the theory was experimentally tested.
DNA Computing is the performing of computations using biological molecules, and DNA specifically, instead of traditional silicon.
[[image: phy2.jpg | Sequence Reconstruction Example | thumb|right|240px]]


==Pipeline for Generating Ancestral Genes==
The advantages of DNA over silicon include:
# Sequences from extant species of the desired common ancestral gene and outgroup genes are aligned.
* Massive parallelism: tasks can be performed simultaneously, rather than linearly in silicon computers.
# The ancestral gene is inferred based on evolutionary models (typically maximum parsimony or maximum likelihood).
* Size: DNA computers are many times smaller than silicon computers.
# Ancestral genes are synthesized via overlapping oligonucleotides and PCR assembly.
* Storage density: much, much more information can be stored in the same amount of space.
# Ancestral genes are cloned and tested for function.


===Methods of Inferring Ancient Sequences===
The disadvantages of DNA over silicon include:
* Consensus Sequence - the most frequently occurring residue of a extant organisms is assumed to be the ancestral state.
* Impracticality: the quantities of DNA required to take advantage of its massive parallelism are unmanageable.
* Maximum Parsimony - minimization of the total number of changes required to account for the terminal sequences.  
* Error prone: getting biochemistry to perform as desired is still more difficult compared to silicon and electricity.
* Maximum Likelihood - ancestral states are evaluated at each internal node in the tree based on the likelihood of each mutation. This process uses a statistical framework of molecular evolution which takes into account differences in certain types of mutations. The generated ancestral sequence gives the probability that each residue is correctly predicted.
* Speed: it still takes a long time to get your results, from start to finish, even though the computational step may be relatively rapid.


===Gene Synthesis=== 
The concept of using molecules in general for computation dates back to 1959 when American physicist [http://en.wikipedia.org/wiki/Richard_Feynman Richard Feynman] presented his ideas on nanotechnology, but it took until 1994 for the first successful experiment using molecules for computation to be completed.
Modern advances in oligonucleotide synthesis allow for the cheap construction of synthetic ancestral genes. Using overlapping PCR assembly most synthetic genes can be constructed for several hundred dollars.


Expression of archaic proteins in extant organisms such as ''E. coli'' may require [http://en.wikipedia.org/wiki/Codon_usage_bias codon optimization] to allow for proper expression<cite>#Perez</cite>. Synthesized genes are typically cloned into a vector and used to transform a host organism.
=History=
[[Image:Adleman_salesman.png | "The principle of Leonard Adleman's DNA computer to solve the 'Travelling salesman' problem.  [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1315819/figure/f1/ Source]" | thumb|right|240px]]
A computer scientist at the University of Southern California named [http://en.wikipedia.org/wiki/Leonard_Adleman Leonard Adleman] first came up with the idea of using DNA for computing purposes after reading "Molecular Biology of the Gene," a book written by James Watson (co-discoverer of the structure of DNA).


===Testing Ancestral Variants===
In 1994 Adleman published an article in the journal Science<cite>#Adleman</cite> that proposed how DNA could be used to solve a well-known mathematical problem, called the directed [http://en.wikipedia.org/wiki/Hamiltonian_path_problem Hamilton Path problem], also known as the "traveling salesman" problem. The goal of the problem is to find the shortest path between a number of cities, going through each city only once. This experiment was significant because it showed that information could indeed be processed using the interactions between strands of synthetic DNA.
After synthesis is complete, ancestral genes can be tested for their function. For example, the ancestral genes can be cloned into bacterial expression vectors and transformed into ''E.coli.'' The proteins can be purified and tested for their activity, whether that be fluorescence spectra, binding affinity, thermoactivity, etc.


==Examples of Ancestral Sequence Reconstructions==
''"He represented each of the seven cities as separate, singlestranded DNA molecules, 20 nucleotides long, and all possible paths between cities as DNA molecules composed of the last ten nucleotides of the departure city and the first ten nucleotides of the arrival city. Mixing the DNA strands with DNA ligase and adenosine triphosphate (ATP) resulted in the generation of all possible random paths through the cities. However, the majority of these paths were not applicable to the situation–they were either too long or too short, or they did not start or finish in the right city. Adleman then filtered out all the paths that neither started nor ended with the correct molecule and those that did not have the correct length and composition. Any remaining DNA molecules represented a solution to the problem"''<cite>#Parker</cite>.
As the technology for ancestral sequence reconstruction has advanced the technique has popularized. It offers a unique way to peer into the past and revisit the biomolecules of our forefathers. The sensitivity of genetic information to degrade limits our understanding to extant organisms or circumstantially preserved ones, such as specimens preserved by ice. The ability to resurrect sequences from the dead and test them relieves us from these temporal limitations.


===Evolution of Coral Pigments===
'''Here is a simpler rephrasing''' of the process of selecting only those DNA strands which were applicable: First he used PCR amplification to amplify only strands that start and end at the correct cities, A and G. Then he used gel electrophoresis to filter out all the strands which were not of the correct length (i.e. strands which visited each of seven towns only once would only contain links between six towns). Finally, he used affinity purification to filter out strands which do not visit each town (i.e. any strands which do not contain town A are first discarded, then those which do not contain B, and so on). Any strands left over must represent a valid route. In this case, the only valid route was encoded by the strand ABBCCDDEEFFG: A to B to C to D to E to F to G.
[[Image:CoralPhy.jpg | Phylogenetic tree of coral pigments drawn on petri dish with bacteria expressing extant and ancestral fluorescent proteins. | thumb|left|360px]]
One example of ancestral sequence reconstruction was done by the Matz group (currently residing at the University of Texas at Austin). Fluorescent proteins from related coral species had wavelengths corresponding to Cyan, Green, and Red<cite>#Ugalde</cite>. The details of the evolution of fluorescent color in the GFP superfamily was not fully understand. That is, what fluorescent spectra did the common ancestors of the modern corals have?


Different models for reconstruction based on amino acids, codons and nucleotides resulted in reconstructed proteins differing in 4-8 amino acids out of 217. Gene synthesis utilized codons designed "to be degenerate in order to incorporate alternative predictions." <cite>#Ugalde</cite> The reconstructed sequences included ''red'', ''pre-red'', ''Red/Green'' and ''ALL''. The ancestral sequences revealed an interesting evolutionary history. The ancestor to all the coral fluorescent proteins had a single green peak at 505nm. Gradually, a shift towards greater expression of lower wavelengths developed, culminating in the pre-red protein. One branch of pre-red went on to lose the ancestral green peak and develop a strong peak around 580nm. Modern red features significantly less emission at outside of the primary peak, thereby increasing its specificity compared to recent ancestors such as pre-red and Red/Green. According to Ugalde ''et al.'', incremental gains in protein function "[have] been anticipated since Darwin, but has only recently been demonstrated in computer simulation experiments"
=Challenges=
Adleman's experiment is an example of an [http://en.wikipedia.org/wiki/NP-complete NP-complete problem] (from non-polynomial) for which no known efficient (i.e. polynomial time) algorithm exists. It has been proposed that these types of problems are ideal for the massive parallelism of biology. Whereas traditional computers are designed to perform one calculation very fast, DNA strands can produce billions of potential answers simultaneously.


However, because in most difficult problems the number of possible solutions grows exponentially with the size of the problem, even relatively small problems would require massive amounts of DNA to represent all possible answers. For example, it has been estimated that scaling up the Hamilton Path Problem to 200 cities would require a weight of DNA that exceeds that of the earth<cite>#NCBI</cite>. For this reason and others, the field has changed focus to other areas.


=DNA-based Logic=
[[Image:Strand_displacement.png | A toehold-mediated strand displacement reaction. The red strand binds to the green strand, displacing and freeing the blue strand. [http://en.wikipedia.org/wiki/File:Strand_displacement.svg source] | thumb|right|240px]]
The fundamental building blocks of computing are based on logic circuits or "gates", such as AND, OR, NOT. Using Adleman's work, other researchers have been developing DNA-based logic circuits that are useful in a variety of ways, like performing mathematical and logical operations and recognizing patterns. Progress is even being made in in-cell programming, opening up the doors to, for example, advanced cancer treatments.


==Strand Displacement==
Perhaps the best-known advances in the field are by a group of researchers at Caltech led by Erik Winfree. In strand-displacement, the inputs are free-floating single strands of DNA or RNA, and [http://en.wikipedia.org/wiki/Logic_gate logic gates] are made from two or more strands, with one being the future output signal. If an input strand has a complementary "tab" to a logic gate, it binds and then displaces the output strand, which then detaches. The output strand then becomes an input to another logic gate, in propagation. In a 2011 paper, Winfree together with colleague Lulu Qian used this technique to build a circuit, made of 74 different DNA strands, capable of calculating the square roots of four-digit binary numbers (aka 0 to 15)<cite>#Seelig</cite>.


===Inferring the Paleoenvironment of ancient Earth===
==DNAzymes==
Ancestral sequence reconstruction has been used to infer the environmental conditions on the early Earth<cite>#Gaucher1</cite>. The study sought to explore the evolutionary history of an elongation factor thermo unstable (EFtu). This elongation factor functions optimally at the temperature which the organism lives, for example thermophilic organisms have an EFtu optimized for high temperatures. By resurrecting the EFtu protein in the common ancestors of bacteria, the temperature profile might elucidate what the environmental conditions were. Interestingly the EFtu common ancestor to all mesophilic bacterium (~1BYA) has an optimal temperature of a thermophile. This suggests that the hypothesis of a hot early Earth is true.
At Columbia University, Milan Stojanovic and his group have built circuits using a sister to strand displacement based on deoxyribozymes or DNAzymes (synthetically made, single-stranded DNA sequences that cut other DNA strands in specific places). They make logic gates by adding "stem loops" to the DNAzyme, which prevent it from working. When input strands bind to complementary sequences on the loop, the loop breaks, thus allowing the DNAzyme to activate and turning the gate on. It then interacts with other strands in a cascading fashion, allowing the creation of more complex logic. Stojanovic's group used this method to build circuits able to play tic-tac-toe<cite>#Stojanovic</cite>.


===Precambrian β lactamases===
==Enzymes==
[[Image:Precambrian_beta-lactamase_denaturation.png | Comparison of the denaturation temperatures of several reconstructed β lactamases compared to extant versions. PNCA is the last common ancestor (CA) of gram-positive and gram-negative bacteria. GNCA is the CA of the gram-negative bacteria. GPBCA is the CA of Gammaproteobacteria. ENCA is the CA of Enterobacteria | thumb|right|240px]]
Logic can also be created using methods that more closely resemble natural operations inside of living cells. A group led by Yaakov Benenson at the Swiss Federal Institute of Technology (ETH Zurich) and Ron Weiss of MIT is creating circuits using common enzymes like polymerases, nucleases and exonucleases that operate inside cells, utilizing existing cellular infrastructure. In 2011, the team created a circuit capable of first recognizing cervical cancer and then destroying the cell it is found within. It works by looking for five microRNAs specific to cervical cancer; when all five are at the right levels, the circuit activates, producing a protein that causes the cell to self-destruct<cite>#Xie</cite>.


The structure of the last common ancestor to extant [http://en.wikipedia.org/wiki/Beta-lactamase β lactamase] has been reconstructed and tested using Ancestral Sequence Reconstruction<cite>Risso</cite>. Sequences for β lactamase from 75 bacterial strains, whose last CA lived 2-3 Gya, were compared. To exclude the effects of recent antibiotic driven evolution, only non-clinical varieties were chosen. Using Bayesian statistics, Risso ''et al.'' calculated the most probabilistic ancestral amino acid for each point in the sequence.  
==Algorithmic self-assembly==
[[Image:Sierpinski_gasket.png | DNA arrays that display a representation of the Sierpinski gasket on their surfaces. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC534809/ source] | thumb|right|240px]]
DNA computing has benefitted from the separate and distinct field of [http://en.wikipedia.org/wiki/DNA_nanotechnology DNA nanotechnology].
The founder of the field, American biochemist Nadrian Seeman, has worked with Erik Winfree to show how two-dimensional sheets of DNA "tiles" could self assemble into other, larger structures. Winfree and his student Paul Rothemund then demonstrated how these structures could be used for computing purposes in a 2004 paper that  describes the self assembly of a fractal called a Sierpinski gasket, which won them the 2006 Feynman Prize in Nanotechnology. Winfree's major insight was that the tiles could be used as "Wang tiles", meaning the assembly of the structure was capable of computation<cite>#Rothemund</cite>.


Resistance to various β lactam based antibiotics was conferred to modern microbes engineered to produce the Precambrian enzyme. The extremely high promiscuity of the Precambrian enzyme compared to extant sequences indicates that over the past few billion years, β lactamases have evolved greater substrate specificity. The Precambrian enzyme featured a denaturation temperature of ~90°C, which is 35°C above that of the highest extant sequence.
=Storage=
[[Image:Church_book.png | Church's methodology for encoding and decoding 700 terabytes of information in DNA. [http://www.extremetech.com/wp-content/uploads/2012/08/coding-decoding-dna-storage.jpg source] | thumb|right|240px]]
DNA can be treated as a highly stable and compact storage medium. In 2012, George Church, a bioengineer and geneticist at Harvard, succeeded in storing 1000 times the largest data size previously achieved using DNA. Church encoded 700 terabytes of information into a single gram of DNA. For comparison, storing the same amount of information using the densest electronic medium would require 330 pounds of hard-drives.  


===Precambrian Thioredoxin===
His methodology was synthesizing [http://en.wikipedia.org/wiki/Word_(computer_architecture) 96-bit] strands, with each base representing a binary value (ex: T and G = 1, A and C = 0). The "data" in question was 70 billion copies of his latest book, [http://www.perseusbooksgroup.com/basic/book_detail.jsp?isbn=0465021751 "Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves"]
Perez-Jiminez ''et. al'' resurrected various Precambrian [http://en.wikipedia.org/wiki/Thioredoxin Thioredoxins]. These belonged to the last common ancestors of the last bacterial common ancestor (LBCA), the last archaeal common ancestor (LACA) and the archaeal-eukaryotic common ancestor (AECA)"<cite>#Perez</cite>. All of these sequences are believed to have been present over 3.5 Gya. The amino-acid sequence in each case was determined by maximum likelihood inference from 200 extant sequences. The genes to code for these proteins were synthesized and cloned into ''E. coli'' for expression and purification.


All three reconstructed proteins showed a T<sub>m</sub> around 113°C, with a ΔT<sub>m</sub> between the highest extant Thioredoxin and the ancestors of around 25°C. The three paleo Thioredoxins also showed substantially greater activity at pH5 than representative extant enzymes from the each domain. The lower substrate specificity exhibited by the reconstructed enzymes indicates the abundance of sulfur rich compounds in the early oceans of Earth, and also hints at the generalist nature of archaic enzymes.
To read the DNA, Church simply sequenced it using next-generation sequencing, converting each of the bases back into binary. Each strand of DNA had a 19-bit address block at the start, so the DNA can be sequenced out of order and then sorted back into usable data by following the addresses.


==Further Applications==
=iGEM Connection=
Ancestral sequence reconstruction holds various benefits. The most direct benefit is a better understanding of archaic organisms and the world they inhabited. Numerous ancient protein reconstructions have yielded enzymes that are much more thermally stable than extant derivatives, even those found in thermophiles, leading to stronger evidence for extremely warm global conditions prior to 1 Gya<cite>Gaucher2</cite>. Archaic proteins also allow for greater study of evolvability, as many reconstructed proteins feature high promiscuity<cite>#Risso</cite>. The patterns in the evolution of high promiscuity Precambrian β lactamase, for instance, could help further the fight against the development of antibiotic resistance. High promiscuity is also key to directed protein evolution. In addition, the reconstruction of archaic enzymes serves as a sort of paleo-bioprospecting<cite>#Perez</cite>, and is currently the best approach to expand thermal stability and pH tolerance in enzymes<cite>#Kaganman</cite>. The enzymes discovered through this form of prospecting and those developed through directed evolution of promiscuous enzymes could serve functional roles in human engineered systems.
[[Image:BIL_gates.png | Drew Endy's "Boolean Integrase Logic" (BIL) gates. [http://youtu.be/ahYZBeP_r5U source] | thumb|right|240px]]
On March 28, 2013, Drew Endy and his team of researchers at Stanford University published a paper<cite>#Bonnet</cite> in Science that outlines how to create single-layer, transcriptor-based logic gates utilizing a three-terminal device architecture to control the flow of RNA polymerase along DNA. Endy has open-sourced these transcriptors through the Registry of Standard Biological Parts, and an informational video is available [http://youtu.be/ahYZBeP_r5U here].
 
A complementary development was published in Nature (January 2013) by scientist Timothy Lu at MIT that details how to store information about a cell's lifecycle events, like key moments in a cell's ancestry<cite>#Siuti</cite>.
 
There is a great animated infographic at the end of [http://www.npr.org/2013/03/29/175604770/tiny-dna-switches-aim-to-revolutionize-cellular-computing. this article] that helps explain the concepts.
 
=Future Directions=
The most popular narrative surrounding the future of DNA computing and DNA-based logic is that one day the technology could be used to program living cells inside our bodies. For example, they could "detect disease, warn of toxic threats and, where danger lurked, even self-destruct cells gone rogue"<cite>#MercuryNews</cite>. To quote Drew Endy: "We're going to be able to put computers inside any living cell you want... Any place you want a little bit of logic, a little bit of computation, a little bit of memory -- we're going to be able to do that"<cite>#MercuryNews</cite>. Whether or not this prediction will come to fruition is up for debate, as the technology has its detractors. For example, many who participated in this year's synthetic biology class (Spring 2013) believed that natural evolutionary processes will always confound the efforts of those trying to bound the behavior of a living cell in vivo. However, no published critiques in the public domain could be found at the time of this writing.  


==References==
==References==
<biblio>
<biblio>
#Ugalde [http://www.sciencemag.org/content/305/5689/1433.full Ugalde, J. a, Chang, B. S. W., & Matz, M. V. (2004). Evolution of coral pigments recreated. Science (New York, N.Y.), 305(5689), 1433. doi:10.1126/science.1099597]
#Adleman [http://www.usc.edu/dept/molecular-science/papers/fp-sci94.pdf Adleman, L. M. (1994). "Molecular computation of solutions to combinatorial problems". Science 266 (5187): 1021–1024. doi:10.1126/science.7973651. PMID 7973651.]
#Gaucher1 [http://www.nature.com/nature/journal/v451/n7179/full/nature06510.html Gaucher, E. a, Govindarajan, S., & Ganesh, O. K. (2008). Palaeotemperature trend for Precambrian life inferred from resurrected proteins. Nature, 451(7179), 704–7. doi:10.1038/nature06510]
#Parker [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1315819/ Parker, J. "Computing with DNA", EMBO Rep. 2003 Jan;4(1):7-10., PMID: 12524509]
#Thornton [http://www.nature.com/nrg/journal/v5/n5/full/nrg1324.html Thornton, J. W. (2004). Resurrecting ancient genes: experimental analysis of extinct molecules. Nature reviews. Genetics, 5(5), 366–75. doi:10.1038/nrg1324]
#Seelig [http://www.sciencemag.org/content/314/5805/1585 Seelig, G.; Soloveichik, D.; Zhang, D. Y.; Winfree, E. (8 December 2006). "Enzyme-free nucleic acid logic circuits". Science 314 (5805): 1585–1588. doi:10.1126/science.1132493. PMID 17158324]
#Pauling Pauling, L. and E. Zuckerkandl. 1963. Chemical paleogenetics: molecular restoration studies of extinct forms of life. Acta Chem. Scand 17:89. CrossRef doi=10.3891
#Stojanovic [http://pubs.acs.org/doi/full/10.1021/ja016756v# Milan N. Stojanovic, Tiffany Elizabeth Mitchell, and Darko Stefanovic (2002) Deoxyribozyme-Based Logic Gates Journal of the American Chemical Society 124(14), 3555-3561]
#Risso [http://pubs.acs.org/doi/abs/10.1021/ja311630a Risso, V. a, Gavira, J. a, Mejia-Carmona, D. F., Gaucher, E. a, & Sanchez-Ruiz, J. M. (2013). Hyperstability and Substrate Promiscuity in Laboratory Resurrections of Precambrian β-Lactamases. Journal of the American Chemical Society. doi:10.1021/ja311630a]
#Xie [http://www.sciencemag.org/content/333/6047/1307nature02551.html Xie, Z et al., "Multi-Input RNAi-Based Logic Circuit for Identification of Specific Cancer Cells", Science 2 September 2011: Vol. 333 no. 6047 pp. 1307-1311 DOI: 10.1126/science.1205527]
#Gaucher2 [http://www.nature.com/nature/journal/v425/n6955/full/nature01977.html Gaucher, E. a, Thomson, J. M., Burgan, M. F., & Benner, S. a. (2003). Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins. Nature, 425(6955), 285–8. doi:10.1038/nature01977]
#Rothemund [http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0020424 Rothemund, P. W. K.; Papadakis, N.; Winfree, E. (2004). "Algorithmic Self-Assembly of DNA Sierpinski Triangles". PLoS Biology 2 (12): e424. doi:10.1371/journal.pbio.0020424. PMC 534809. PMID 15583715]
#Perez [http://www.nature.com/nsmb/journal/v18/n5/full/nsmb.2020.html Perez-Jimenez, R., Inglés-Prieto, A., Zhao, Z.-M., Sanchez-Romero, I., Alegre-Cebollada, J., Kosuri, P., Garcia-Manyes, S., et al. (2011). Single-molecule paleoenzymology probes the chemistry of resurrected enzymes. Nature structural & molecular biology, 18(5), 592–6. doi:10.1038/nsmb.2020]
#Bonnet [http://www.sciencemag.org/content/early/2013/03/27/science.1232758  Bonnet et. al., "Amplifying Genetic Logic Gates", Published Online March 28 2013, Science DOI: 10.1126/science.1232758]
#Kaganman [http://www.nature.com/nmeth/journal/v8/n6/full/nmeth0611-452.html Kaganman, I. (2011). Resurrected enzymes. Nature Methods, 8(6), 452–452. doi:10.1038/nmeth0611-452]
#Siuti [http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2510.html Siuti, P., Yazbek, J. & Lu, T. K. Nature Biotech. advance online publication, http://dx.doi.org/10.1038/nbt.2510 (2013)]
#NCBI [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1315819/ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1315819/]
#MercuryNews [http://www.mercurynews.com/science/ci_22891433/biological-computer-created-at-stanford http://www.mercurynews.com/science/ci_22891433/biological-computer-created-at-stanford]
 
</biblio>
</biblio>

Latest revision as of 14:42, 2 May 2013

Introduction

DNA Computing is the performing of computations using biological molecules, and DNA specifically, instead of traditional silicon.

The advantages of DNA over silicon include:

  • Massive parallelism: tasks can be performed simultaneously, rather than linearly in silicon computers.
  • Size: DNA computers are many times smaller than silicon computers.
  • Storage density: much, much more information can be stored in the same amount of space.

The disadvantages of DNA over silicon include:

  • Impracticality: the quantities of DNA required to take advantage of its massive parallelism are unmanageable.
  • Error prone: getting biochemistry to perform as desired is still more difficult compared to silicon and electricity.
  • Speed: it still takes a long time to get your results, from start to finish, even though the computational step may be relatively rapid.

The concept of using molecules in general for computation dates back to 1959 when American physicist Richard Feynman presented his ideas on nanotechnology, but it took until 1994 for the first successful experiment using molecules for computation to be completed.

History

"The principle of Leonard Adleman's DNA computer to solve the 'Travelling salesman' problem. Source"

A computer scientist at the University of Southern California named Leonard Adleman first came up with the idea of using DNA for computing purposes after reading "Molecular Biology of the Gene," a book written by James Watson (co-discoverer of the structure of DNA).

In 1994 Adleman published an article in the journal Science[1] that proposed how DNA could be used to solve a well-known mathematical problem, called the directed Hamilton Path problem, also known as the "traveling salesman" problem. The goal of the problem is to find the shortest path between a number of cities, going through each city only once. This experiment was significant because it showed that information could indeed be processed using the interactions between strands of synthetic DNA.

"He represented each of the seven cities as separate, singlestranded DNA molecules, 20 nucleotides long, and all possible paths between cities as DNA molecules composed of the last ten nucleotides of the departure city and the first ten nucleotides of the arrival city. Mixing the DNA strands with DNA ligase and adenosine triphosphate (ATP) resulted in the generation of all possible random paths through the cities. However, the majority of these paths were not applicable to the situation–they were either too long or too short, or they did not start or finish in the right city. Adleman then filtered out all the paths that neither started nor ended with the correct molecule and those that did not have the correct length and composition. Any remaining DNA molecules represented a solution to the problem"[2].

Here is a simpler rephrasing of the process of selecting only those DNA strands which were applicable: First he used PCR amplification to amplify only strands that start and end at the correct cities, A and G. Then he used gel electrophoresis to filter out all the strands which were not of the correct length (i.e. strands which visited each of seven towns only once would only contain links between six towns). Finally, he used affinity purification to filter out strands which do not visit each town (i.e. any strands which do not contain town A are first discarded, then those which do not contain B, and so on). Any strands left over must represent a valid route. In this case, the only valid route was encoded by the strand ABBCCDDEEFFG: A to B to C to D to E to F to G.

Challenges

Adleman's experiment is an example of an NP-complete problem (from non-polynomial) for which no known efficient (i.e. polynomial time) algorithm exists. It has been proposed that these types of problems are ideal for the massive parallelism of biology. Whereas traditional computers are designed to perform one calculation very fast, DNA strands can produce billions of potential answers simultaneously.

However, because in most difficult problems the number of possible solutions grows exponentially with the size of the problem, even relatively small problems would require massive amounts of DNA to represent all possible answers. For example, it has been estimated that scaling up the Hamilton Path Problem to 200 cities would require a weight of DNA that exceeds that of the earth[3]. For this reason and others, the field has changed focus to other areas.

DNA-based Logic

A toehold-mediated strand displacement reaction. The red strand binds to the green strand, displacing and freeing the blue strand. source

The fundamental building blocks of computing are based on logic circuits or "gates", such as AND, OR, NOT. Using Adleman's work, other researchers have been developing DNA-based logic circuits that are useful in a variety of ways, like performing mathematical and logical operations and recognizing patterns. Progress is even being made in in-cell programming, opening up the doors to, for example, advanced cancer treatments.

Strand Displacement

Perhaps the best-known advances in the field are by a group of researchers at Caltech led by Erik Winfree. In strand-displacement, the inputs are free-floating single strands of DNA or RNA, and logic gates are made from two or more strands, with one being the future output signal. If an input strand has a complementary "tab" to a logic gate, it binds and then displaces the output strand, which then detaches. The output strand then becomes an input to another logic gate, in propagation. In a 2011 paper, Winfree together with colleague Lulu Qian used this technique to build a circuit, made of 74 different DNA strands, capable of calculating the square roots of four-digit binary numbers (aka 0 to 15)[4].

DNAzymes

At Columbia University, Milan Stojanovic and his group have built circuits using a sister to strand displacement based on deoxyribozymes or DNAzymes (synthetically made, single-stranded DNA sequences that cut other DNA strands in specific places). They make logic gates by adding "stem loops" to the DNAzyme, which prevent it from working. When input strands bind to complementary sequences on the loop, the loop breaks, thus allowing the DNAzyme to activate and turning the gate on. It then interacts with other strands in a cascading fashion, allowing the creation of more complex logic. Stojanovic's group used this method to build circuits able to play tic-tac-toe[5].

Enzymes

Logic can also be created using methods that more closely resemble natural operations inside of living cells. A group led by Yaakov Benenson at the Swiss Federal Institute of Technology (ETH Zurich) and Ron Weiss of MIT is creating circuits using common enzymes like polymerases, nucleases and exonucleases that operate inside cells, utilizing existing cellular infrastructure. In 2011, the team created a circuit capable of first recognizing cervical cancer and then destroying the cell it is found within. It works by looking for five microRNAs specific to cervical cancer; when all five are at the right levels, the circuit activates, producing a protein that causes the cell to self-destruct[6].

Algorithmic self-assembly

DNA arrays that display a representation of the Sierpinski gasket on their surfaces. source

DNA computing has benefitted from the separate and distinct field of DNA nanotechnology. The founder of the field, American biochemist Nadrian Seeman, has worked with Erik Winfree to show how two-dimensional sheets of DNA "tiles" could self assemble into other, larger structures. Winfree and his student Paul Rothemund then demonstrated how these structures could be used for computing purposes in a 2004 paper that describes the self assembly of a fractal called a Sierpinski gasket, which won them the 2006 Feynman Prize in Nanotechnology. Winfree's major insight was that the tiles could be used as "Wang tiles", meaning the assembly of the structure was capable of computation[7].

Storage

Church's methodology for encoding and decoding 700 terabytes of information in DNA. source

DNA can be treated as a highly stable and compact storage medium. In 2012, George Church, a bioengineer and geneticist at Harvard, succeeded in storing 1000 times the largest data size previously achieved using DNA. Church encoded 700 terabytes of information into a single gram of DNA. For comparison, storing the same amount of information using the densest electronic medium would require 330 pounds of hard-drives.

His methodology was synthesizing 96-bit strands, with each base representing a binary value (ex: T and G = 1, A and C = 0). The "data" in question was 70 billion copies of his latest book, "Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves"

To read the DNA, Church simply sequenced it using next-generation sequencing, converting each of the bases back into binary. Each strand of DNA had a 19-bit address block at the start, so the DNA can be sequenced out of order and then sorted back into usable data by following the addresses.

iGEM Connection

Drew Endy's "Boolean Integrase Logic" (BIL) gates. source

On March 28, 2013, Drew Endy and his team of researchers at Stanford University published a paper[8] in Science that outlines how to create single-layer, transcriptor-based logic gates utilizing a three-terminal device architecture to control the flow of RNA polymerase along DNA. Endy has open-sourced these transcriptors through the Registry of Standard Biological Parts, and an informational video is available here.

A complementary development was published in Nature (January 2013) by scientist Timothy Lu at MIT that details how to store information about a cell's lifecycle events, like key moments in a cell's ancestry[9].

There is a great animated infographic at the end of this article that helps explain the concepts.

Future Directions

The most popular narrative surrounding the future of DNA computing and DNA-based logic is that one day the technology could be used to program living cells inside our bodies. For example, they could "detect disease, warn of toxic threats and, where danger lurked, even self-destruct cells gone rogue"[10]. To quote Drew Endy: "We're going to be able to put computers inside any living cell you want... Any place you want a little bit of logic, a little bit of computation, a little bit of memory -- we're going to be able to do that"[10]. Whether or not this prediction will come to fruition is up for debate, as the technology has its detractors. For example, many who participated in this year's synthetic biology class (Spring 2013) believed that natural evolutionary processes will always confound the efforts of those trying to bound the behavior of a living cell in vivo. However, no published critiques in the public domain could be found at the time of this writing.

References

  1. [Adleman]
  2. [Parker]
  3. [NCBI]
  4. [Seelig]
  5. [Stojanovic]
  6. [Xie]
  7. [Rothemund]
  8. [Bonnet]
  9. [Siuti]
  10. [MercuryNews]