CH391L/S13/DNA Computing: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
mNo edit summary
mNo edit summary
Line 7: Line 7:
[[Image:Adleman_salesman.png | "The principle of Leonard Adleman's DNA computer to solve the 'Travelling salesman' problem." | thumb|right|240px]]
[[Image:Adleman_salesman.png | "The principle of Leonard Adleman's DNA computer to solve the 'Travelling salesman' problem." | thumb|right|240px]]


In 1994 Adleman published an article in the journal Science<cite>#Adleman</cite> that proposed how DNA could be used to solve a well-known mathematical problem, called the directed Hamilton Path problem, also known as the "traveling salesman" problem. The goal of the problem is to find the shortest path between a number of cities, going through each city only once.
In 1994 Adleman published an article in the journal Science<cite>#Adleman</cite> that proposed how DNA could be used to solve a well-known mathematical problem, called the directed Hamilton Path problem, also known as the "traveling salesman" problem. The goal of the problem is to find the shortest path between a number of cities, going through each city only once. This experiment was significant because it showed that information could indeed be processed using the interactions between strands of synthetic DNA.
 
''"He represented each of the seven cities as separate, singlestranded DNA molecules, 20 nucleotides long, and all possible paths between cities as DNA molecules composed of the last ten nucleotides of the departure city and the first ten nucleotides of the arrival city. Mixing the DNA strands with DNA ligase and adenosine triphosphate (ATP) resulted in the generation of all possible random paths through the cities. However, the majority of these paths were not applicable to the situation–they were either too long or too short, or they did not start or finish in the right city. Adleman then filtered out all the paths that neither started nor ended with the correct molecule and those that did not have the correct length and composition. Any remaining DNA molecules represented a solution to the problem."''<cite>#Parker</cite>
 
This is an example of an NP-complete problem (from non-polynomial) for which no known efficient (i.e. polynomial time) algorithm exists. It has been proposed that these types of problems are ideal for the massive parallelism of biology. Whereas traditional computers are designed to perform one calculation very fast, DNA strands can produce billions of potential answers simultaneously.
 
However, because in most difficult problems the number of possible solutions grows exponentially with the size of the problem, even relatively small problems would require massive amounts of DNA to represent all possible answers. For example, it has been estimated that scaling up the Hamilton Path Problem to 200 cities would require a weight of DNA that exceeds that of the earth. For this reason and others, the field has changed focus to other areas.


==Pipeline for Generating Ancestral Genes==
==Pipeline for Generating Ancestral Genes==
Line 21: Line 27:




===Precambrian Thioredoxin===
==Storage==
Perez-Jiminez ''et. al'' resurrected various Precambrian [http://en.wikipedia.org/wiki/Thioredoxin Thioredoxins]. These belonged to the last common ancestors of the last bacterial common ancestor (LBCA), the last archaeal common ancestor (LACA) and the archaeal-eukaryotic common ancestor (AECA)"<cite>#Perez</cite>. All of these sequences are believed to have been present over 3.5 Gya. The amino-acid sequence in each case was determined by maximum likelihood inference from 200 extant sequences. The genes to code for these proteins were synthesized and cloned into ''E. coli'' for expression and purification.
[[Image:Church_book.png | Church's methodology for encoding and decoding 700 terabytes of information in DNA. | thumb|right|240px]]
DNA can be treated as a highly stable and compact storage medium. In 2012, George Church, a bioengineer and geneticist at Harvard, succeeded in storing 1000 times the largest data size previously achieved using DNA. Church encoded 700 terabytes of information into a single gram of DNA. For comparison, storing the same amount of information using the densest electronic medium would require 330 pounds of hard-drives.  


All three reconstructed proteins showed a T<sub>m</sub> around 113°C, with a ΔT<sub>m</sub> between the highest extant Thioredoxin and the ancestors of around 25°C. The three paleo Thioredoxins also showed substantially greater activity at pH5 than representative extant enzymes from the each domain. The lower substrate specificity exhibited by the reconstructed enzymes indicates the abundance of sulfur rich compounds in the early oceans of Earth, and also hints at the generalist nature of archaic enzymes.
His methodology was synthesizing 96-bit strands, with each base representing a binary value (ex: T and G = 1, A and C = 0). The "data" in question was 70 billion copies of his latest book, [http://www.perseusbooksgroup.com/basic/book_detail.jsp?isbn=0465021751 "Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves"]


To read the DNA, Church simply sequenced it using next-generation sequencing, converting each of the bases back into binary. Each strand of DNA had a 19-bit address block at the start so the DNA can be sequenced out of order and then sorted into usable data using the addresses.


==References==
==References==
<biblio>
<biblio>
#Adleman [http://www.usc.edu/dept/molecular-science/papers/fp-sci94.pdf Adleman, L. M. (1994). "Molecular computation of solutions to combinatorial problems". Science 266 (5187): 1021–1024. doi:10.1126/science.7973651. PMID 7973651.]
#Adleman [http://www.usc.edu/dept/molecular-science/papers/fp-sci94.pdf Adleman, L. M. (1994). "Molecular computation of solutions to combinatorial problems". Science 266 (5187): 1021–1024. doi:10.1126/science.7973651. PMID 7973651.]
#Parker [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1315819/ Parker, J. "Computing with DNA", EMBO Rep. 2003 Jan;4(1):7-10., PMID: 12524509]


</biblio>
</biblio>

Revision as of 09:52, 1 April 2013

Introduction

History

A computer scientist at the University of Southern California named Leonard Adleman first came up with the idea of using DNA for computing purposes after reading "Molecular Biology of the Gene," a book written by James Watson (co-discoverer of the structure of DNA).

"The principle of Leonard Adleman's DNA computer to solve the 'Travelling salesman' problem."

In 1994 Adleman published an article in the journal Science[1] that proposed how DNA could be used to solve a well-known mathematical problem, called the directed Hamilton Path problem, also known as the "traveling salesman" problem. The goal of the problem is to find the shortest path between a number of cities, going through each city only once. This experiment was significant because it showed that information could indeed be processed using the interactions between strands of synthetic DNA.

"He represented each of the seven cities as separate, singlestranded DNA molecules, 20 nucleotides long, and all possible paths between cities as DNA molecules composed of the last ten nucleotides of the departure city and the first ten nucleotides of the arrival city. Mixing the DNA strands with DNA ligase and adenosine triphosphate (ATP) resulted in the generation of all possible random paths through the cities. However, the majority of these paths were not applicable to the situation–they were either too long or too short, or they did not start or finish in the right city. Adleman then filtered out all the paths that neither started nor ended with the correct molecule and those that did not have the correct length and composition. Any remaining DNA molecules represented a solution to the problem."[2]

This is an example of an NP-complete problem (from non-polynomial) for which no known efficient (i.e. polynomial time) algorithm exists. It has been proposed that these types of problems are ideal for the massive parallelism of biology. Whereas traditional computers are designed to perform one calculation very fast, DNA strands can produce billions of potential answers simultaneously.

However, because in most difficult problems the number of possible solutions grows exponentially with the size of the problem, even relatively small problems would require massive amounts of DNA to represent all possible answers. For example, it has been estimated that scaling up the Hamilton Path Problem to 200 cities would require a weight of DNA that exceeds that of the earth. For this reason and others, the field has changed focus to other areas.

Pipeline for Generating Ancestral Genes

  1. Sequences from extant species of the desired common ancestral gene and outgroup genes are aligned.
  2. The ancestral gene is inferred based on evolutionary models (typically maximum parsimony or maximum likelihood).
  3. Ancestral genes are synthesized via overlapping oligonucleotides and PCR assembly.
  4. Ancestral genes are cloned and tested for function.

Methods of Inferring Ancient Sequences

  • Consensus Sequence - the most frequently occurring residue of a extant organisms is assumed to be the ancestral state.
  • Maximum Parsimony - minimization of the total number of changes required to account for the terminal sequences.
  • Maximum Likelihood - ancestral states are evaluated at each internal node in the tree based on the likelihood of each mutation. This process uses a statistical framework of molecular evolution which takes into account differences in certain types of mutations. The generated ancestral sequence gives the probability that each residue is correctly predicted.


Storage

Church's methodology for encoding and decoding 700 terabytes of information in DNA.

DNA can be treated as a highly stable and compact storage medium. In 2012, George Church, a bioengineer and geneticist at Harvard, succeeded in storing 1000 times the largest data size previously achieved using DNA. Church encoded 700 terabytes of information into a single gram of DNA. For comparison, storing the same amount of information using the densest electronic medium would require 330 pounds of hard-drives.

His methodology was synthesizing 96-bit strands, with each base representing a binary value (ex: T and G = 1, A and C = 0). The "data" in question was 70 billion copies of his latest book, "Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves"

To read the DNA, Church simply sequenced it using next-generation sequencing, converting each of the bases back into binary. Each strand of DNA had a 19-bit address block at the start so the DNA can be sequenced out of order and then sorted into usable data using the addresses.

References

  1. [Adleman]
  2. [Parker]