# Biomod/2011/NUS/DNAmazing

(Difference between revisions)
 Revision as of 16:46, 30 October 2011 (view source) (→Addition of sticky ends)← Previous diff Current revision (01:43, 3 November 2011) (view source) (→3. Why are we interested in creating DNAmazing?) (11 intermediate revisions not shown.) Line 1: Line 1: - {{New user}} + + + - The Computational Chemistry toolkits (general description here) - The GUI will provide users with basic interface with the program to (general description here) - === The Design=== - ====Basic Dogmas of Design==== - The Design of 2D DNA Origami in DNAmazing follows the principles which were laid out in Rothermund's first paper in 2006. The basic idea of DNA Origami is to fold a DNA helix into a desired shape. One strand of the DNA helix is a long and continuous DNA strand, called the scaffold strand; another strand consists of several short DNA fragments, the staple strands. The staple strands are together complementary to the scaffold to form the DNA helix. The formation of crossovers of staple stands keep the scaffold strand in the desired shape.
-
- [[Image:terms.jpg|700px]] -
For the purpose of design, the folded DNA helix is conceptually divided into several small helices which one helix is one turn of the folded helix. Each of these turns/helices is represented as one square in the program. Each square is given a number. The labeling is done from the left to right and from the bottom row to the top row. The non-integer number of bases pair per turn: 10.67 will be approximated as 11 base pairs. The DNA helix is folded by forming several crossovers in the staple strands; these crossovers indicate the positions where a staple strand switch to another helix located on a different row. These switching only occurs at locations where DNA twist places at its tangent point between helices which is apart by any odd number of half-turns. In this project, we will stick to 1.5 turns. -
Throughout the software, there are 2 coordinate systems which are used to refer a specific square in the DNA Origami structures. The  labeling mentioned in this part is the matrix coordinate. The other is the scaffold coordinate which will be described later in [[6.2.3 Generation of scaffolding pathways.]] - ====Inputting parameters==== - Recognizing the fact that the conventional input of existing programs may not be convenient for large and complex structures,DNAmazing adopts a very different way: a lithography-like way.Instead of drawing the scaffold way, which may be painful and even impossible for complicated designs,users will input the dimensions of a rectangle that encloses their desired structure. The dimensional units are the number of helices/squares per row and per column. The users will achieve their final desired shape by eliminate the unwanted squares. The elimination id done by inputting the number of the unwanted squares (null squares).
- [[Image:Input process.jpg]] +
+ + +
+ [[Image:Videocode.png|right|200px]] -
In the above example, the desired DNA Origami shape is enclosed by a rectangular frame 6 squares x 6 squares. There are totally 8 null squares: 12,18,,24,30,17,23,29,35. +
- ====Generation of scaffolding pathways==== + [[Image:Starthere.png|left|170px]] - One of the unique features of DNAmazing is its ability to automatically generate the scaffolding pathways. For the existing programs, users have to manually design how to fold a scaffold strand to the desired shapes. This progress may be tedious for complex structures such as smiling faces in Rothermund's paper. In DNAmazing, users only have to conceptualize the DNA Origami structures into series of squares which was described in the previous part. This is definitely more relaxing. +

- The algorithm to determine the crossover positions starts with the generation of an ArrayList, which is elementally a matrix with flexible dimensions. We named it PosCros. The Poscros Arraylist is used to add the squares which contain the crossover position. The first element of PosCros is always the first element in the scaffold way. The next elements are determined based on in which category the previous element is; the categorization is done based on the relative distance between the element and the closest turning point of the scaffold. - - ====Generation of Sticky ends' Sequence==== - - Sticky ends serving as an extra ends of a staples should not interfere with the scaffold folding in the formation of DNA Origami. So, sticky end sequences must not have any stable binding to any sequence in the scaffold. To generate sticky end sequences, DNA sequences of a defined length are generated randomly. The newly generated sequences are then to be examined for its ability to bind to the scaffold. Sequences which have a rather stabilizing binding with any position in the scaffold are discarded. Only those without stable scaffold binding are kept and can be used as sticky ends’ sequences. - - To determine if the sticky end would have any stabilizing binding to the scaffold, one needs to know binding energy of the sticky end to every sequence in the scaffold. In addition, a threshold below which the binding is considered stable is also required. - - =====Calculation of binding energy===== - - The Sticky end sequence given is mapped along the scaffold length and the binding energy (deltaG) is calculated for each match/mismatch binding. The calculation was done using the formula and complete thermodynamic database for internal single mismatches discussed in SantaLucia’s studies (2006) (1). The formula and parameters are shown bellow: -
- [[Image:CodeCogsEqn.gif]] -
- - Nearest-neighbor $\Delta G^o$ increments (kcal/mol) for internal single mismatches next to Watson-Crick pairs in 1 M NaCl -
- [[Image:TableMis.jpg]] -
- - For example, consider the  total binding energy of following DNA duplex. The mismatch base pair is '''bold''': -
- [[Image:Cal1.gif]] -
- - =====Set up a threshold===== - To determine if the mismatched complement between the sticky ends and the scaffolds are stable or unstable, a threshold of binding energy ($\Delta G^o$) is required. Binding energy less than or equal to this threshold would be consider stable. There should not be an absolute threshold value for every DNA sequence with different length. Longer DNA sequences require lower $\Delta G^o$ for a stable binding. Therefore, the threshold is set up as a variable calculated based on the sequence length: - Let n be the length of DNA sequence. If n is even, the threshold is calculated as follow: -
- [[Image:thesholdEven.png]] -
- - if n is odd, then the formula for the threshold is: -
- [[Image:thesholdOdd.png]] -
- - In the equation, -0.58 is the binding energy between 5’-TA-3’/3’-AT-5’, and -0.88 is the binding energy between 5’-AT-3’/3’-TA-5’. This means that the right side of the equation equal the binding energy of the complement 5’-(AT)n-3’/3’-(TA)n-5’of the same length. In other words, there would be no binding between the sticky end and scaffold which is more stable than the least stable Watson-Crick fully complemented DNA duplex - - ====Merging Process==== - - === The prediction of the thermal stability of the duplex produced from sticky end === - - Predict the thermal stability of short DNA duplex which is formed upon the binding of the sticky end and its complementary single-stranded strand. - - The capability to estimate the thermal stability will aid in numerous applications such as (i) predicting the stability of a local sequence on DNA duplex, or of a probe-gene complex, (ii) calculating the melting temperatures of short sequences in hybridization experiments, (iii) determining the optimal length of the probe oligomer to produce stable duplexes with the sticky ends. Recently, the order-disorder transition of a sticky end with its complementary single strand is also important in controlling the dynamic movement of nanomotors, which are made from DNA strands (reference?). - - Research has shown that the thermal stability of duplex is affected by sequence information and base compositions. However, the sequence of DNA strand is the major determinant of $\Delta$$H^o$, $\Delta$$S^o$, and $\Delta$$G^o$. We apply the nearest-neighbor (NN) method to determine the transition enthalpy, entropy, free energy, and melting point of short DNA duplex. This method calculates those thermodynamic values using the stacking interaction between Watson-Crick neighboring bases in the DNA strands. - - DNAmazing program will not only assist in random generating stick ends attached to pre-determined positions on DNA Origami, but also allow users to input their preferred sequence information of the sticky ends. Since different sequences have different thermal stability (represented by $\Delta$$H^o$, $\Delta$$S^o$, and $\Delta$$G^o$) upon binding, knowing those thermodynamic values is crucial to study the function and applications of the sticky ends. - - - - Besides, DNAmazing program also helps to determine whether the sticky end's sequence input by user is complementary to the scaffold strand or other staple strands. - - There are many groups have dedicated researching on NN method to determine $\Delta$$H^o$, $\Delta$$S^o$, $\Delta$$G^o$, and $T_m$ of short DNA oligomers and have arrived on the same formula as demonstrated below. However, since difference researches used different starting materials (short DNA oligomers, polymers, etc.), the values for one parameter slightly vary. We have chosen the latest results obtained by John S.L. et al to incorporate into our software. -
- :$- \Delta H^o_{} = \Delta H^o_{ini} + \Delta H^o_{sym} + \Delta H^o_{AT term.} + \Sigma \Delta H^o_{stacking}\! -$ -
- Where $\Delta H^o_{}$ is the helix initiation enthalpy of the transition process; $\Delta H^o_{sym}$ is the symmetry term only  applies to self-complementary duplexes, accounting for the enthalpy difference between a duplex formed from a self-complementary sequence and a duplex formed from 2 complementary strands; $\Delta H^o_{AT term.}$ is applied for each end of a duplex that has a terminal AT, accounting for the end-fraying caused by AT base pair; $\Sigma \Delta H^o_{stacking}$ is the total of enthalpy of propagation step in the sequence. - - For example: -
- :- \begin{align} - \Delta H^o_{} (5'-CGTTGA-3') & = \Delta H^o_{ini} + \Delta H^o_{sym} + \Delta H^o_{AT term.} + \Sigma \Delta H^o_{stacking} \\ - & = 0.2 + 0.0 + 2.2 + ( - 10.6 - 8.4 - 7.6 - 8.5 - 8.2) \\ - & = -40.9 (kcal/mol) \\ - \end{align} - -
- $\Delta S^o$, $\Delta G^o$ are calculated using the same formula (1) above. - - There are 10 propagation steps, 1 initiation, and 1 terminal AT correction to make up a total of 12 NN parameters shown in Table 1. These values are obtained via multiple linear regression of the results from differential scanning calorimetry (DSC) of 108 short DNA sequences. - - {| class="wikitable" align="center" border="1" cellpadding="5" cellspacing="0" - ! style="background: #efefef;" |Propagation step - ! style="background: #efefef;" |$\Delta H^o$ (kcal/mol) - ! style="background: #efefef;" |$\Delta S^o$ (e.u.) - ! style="background: #efefef;" |$\Delta G^o$ (kcal/mol) - |- - |align=center|AA/TT - |align=center|-7.6 - |align=center|-21.3 - |align=center|-1.00 - |- - |align=center|AT/TA - |align=center|-7.2 - |align=center|-20.4 - |align=center|-0.88 - |- - |align=center|TA/AT - |align=center|-7.2 - |align=center|-21.3 - |align=center|-0.58 - |- - |align=center|CA/GT - |align=center|-8.5 - |align=center|-22.7 - |align=center|-1.45 - |- - |align=center|GT/CA - |align=center|-8.4 - |align=center|-22.4 - |align=center|-1.44 - |- - |align=center|CT/GA - |align=center|-7.8 - |align=center|-21.0 - |align=center|-1.28 - |- - |align=center|GA/CT - |align=center|-8.2 - |align=center|-22.2 - |align=center|-1.30 - |- - |align=center|CG/GC - |align=center|-10.6 - |align=center|-27.2 - |align=center|-2.17 - |- - |align=center|GC/CG - |align=center|-9.8 - |align=center|-24.4 - |align=center|-2.24 - |- - |align=center|GG/CC - |align=center|-8.0 - |align=center|-19.9 - |align=center|-1.84 - |- - |align=center|Initiation - |align=center|+0.2 - |align=center|-5.7 - |align=center|+1.96 - |- - |align=center|Terminal AT penalty - |align=center|+2.2 - |align=center|+6.9 - |align=center|+0.05 - |- - |align=center|Symmetry correction - |align=center|0.0 - |align=center|-1.4 - |align=center|+0.43 - |} - - The melting point of short DNA chain, defined as the temperature at which half of double-stranded DNA sequences have dissociated, is calculated as following: -
- :$- T_m = \frac{\Delta H^o \times 1000} {\Delta S^o + R \times \ln( \frac{C_t}{x} ) - 273.15} -$ -
- where $C_t$ is the total molar strand concentration. For nonself-complementary duplexes x=4, and for self-complementary, x=1. - - - NN method is just an approximation because it neglects the secondary interactions in the DNA duplexes (we assume that the DNA duplexes undergo two-state transition), and the heat capacity $C_p$ is constant over different temperatures. To reduce such inaccuracy in calculation, short DNA oligomers (less than 30 base pairs) were used to minimize the secondary interaction within the DNA molecule. - - Sodium dependence of $\Delta S^o$ and $\Delta G^o$ - - The entropy and free energy calculated from formula (1) above apply at 37oC and 1M NaCl. To extend the results to various salt condition, the following correction formulae have been derived by (***) -
- :$- \Delta S^o [Na^+] = \Delta S^o [1M NaCl] + 0.368 \times N/2 \times ln[Na^+] -$ -
-
- :$- \Delta G^o [Na^+] = \Delta G^o [1M NaCl] + 0.114 \times N/2 \times ln[Na^+] -$ -
- - where N is the total number of phosphate in the duplex and [Na^+] is the total concentration of monovalent cations ($Na^+$, $K^+$, $NH^{4+}$) in the solution. $\Delta H^o$ is assumed to be sodium-independent. - - To calculate the value of $\Delta G^o$ at temperature different than 37$^o$C , the following equation is used: -
- $\Delta G^o = \Delta H^o - T\Delta S^o$ -
- in which T is in Kelvin, $\Delta H^o$ is in cal/mol, and $\Delta S^o$ is in entropy units (e.u.). $\Delta H^o$ and $\Delta S^o$ are assumed to be independent of temperature. - - ===The User Interface (GUI)=== - GUI or graphic user interface is constructed to create a friendly environment for users to construct their DNA origami. Our GUI is generated using Window form application in Visual studio 2010. Our software has three main components to support the DNA Origami design with sticky end addition and the themaldynamic analysis of sticky ends. The code sources are provided in the attachments. - - ====Generate DNAO==== - For the first component, staples’ sequences used for the correct folding of DNA Origami with sticky ends are generated. User are required to define the size and shape of the structures they want to design by first input the frame size, and then choose the null squares (the location which will not be occupied by the scaffold). This would help the program to understand the DNA Origami design. -
- [[Image:GUI_2new.png]] -
- After obtaining the parameters required, the program will generate different possible scaffold ways and ask users to choose one of their interest. -
- [[Image:GUI_3.png]] -
- Users can also choose to add sticky end by enter the number of sticky end they need and specify the sequence and location of sticky ends in the scaffold. -
- [[Image:GUI_4.png]] -
- Final staple sequences are generated and appear in the result window. -
- [[Image:GUI_5.png]] -
- - ====Generate sticky end sequence==== - To support generation of sticky end, as well as, to ensure that the sticky end will not affect the scaffold folding, an additional component is provided. User can choose to manually input a DNA sequence, and the program can help to check for the most stabilizing binding position in the scaffold. The binding energy is also calculated for users’ reference. -
- [[Image:GUI_6.png‎]] -
- User can also ask the program to generate the sticky end sequence with the defined length. DNA sequences with binding energy higher than a limit defined are given. The below image illustrates the output of sticky ends' sequence generation. -
- [[Image:GUI_7.png‎]] -
- - ====Thermaldynamic analysis==== - The other component of the software is also to support the sticky end analysis in which thermal dynamic values of the sequence are calculated. Users need to enter the sequence they want to analyze, together with the condition in which they would test the DNA (total DNA strand concentration, Na+ concentration, and melting temperatures). Thermaldynamics value including deltaG, deltaS, deltaH, and Tm are provided in the results pages. -
- [[Image:GUI_8new.png‎]] -

# Everything you need to know about DNAmazing in 5 minutes

## 1. What is DNAmazing?

DNAmazing is a software dedicated for the design of DNA Origami structures. Like other CAD program in DNA Origami, such as caDNAnano, Nanoengineer, to name a few, DNAmazing helps designers determine the sequence of DNA strands.

## 2. Why another CAD program while there are several good ones?

Besides the basic functions of a CAD program in DNA Origami, DNAmazing was build on the vision of visualizing the applications of DNA Origami in the future: DNA motors operate on a complicated traffic system made by DNA Origami, complex nanoelectric circuits build from DNA Origami molds, and the organization of metal nanoparticles on DNA Origami platforms. These future DNA Origami designs are usually too complex to get the footprint done manually (the generation of the folding path, the determination of crossover positions, and the merging of staple sequences. Furthermore, these systems must find a way to interact with external devices, particles and environment. The best way may be the fabrication of sticky ends which act as connection sites. To our best knowledge, these features are not well focused in current CAD programs though we believe they are crucial in the future development of DNA Origami. Thus, CAD programs DNAmazing is expected to fulfill these gaps.

## 3. Why are we interested in creating DNAmazing?

Firstly, we conceive that a good DNA Origami structure must start from the good design, in other words a good CAD program. Furthermore, in the beginning our group had an idea of creating a complex traffic system on which DNA "motorcycles" operated. As mentioned above, this idea requires both the complexity of the system and the design of sticky ends as steps for DNA motorcycles. Hence, we decided to develop a CAD program from scratch for this purpose as Rothermund did in 3 months before he invented the technique of DNA Origami. This is really a challenge as we have a different approach to design the CAD program compared to existing ones and none of our team members has been trained in computing science.

## 4. What are the goals and our achievements in this project?

Goals: DNAmazing was expected to

1. Automatically generate folding path
2. Automatically determine crossover position
3. Automatically generate DNA staple sequence
4. Design of sticky ends
5. Basic chemical toolkits to determine the chemical properties of stick ends

Achievements: DNAmazing is able to

1. Automatically generate the folding of small and medium DNA Origami structures with arbitrary holes and shapes. Larger structures may take longer time to compute the possible folding paths.
2. Select the most suitable folding paths for DNA Origami
3. Automatically determine crossover position
4. Automatically generate DNA staple sequence
5. Design of sticky ends
6. Basic chemical toolkits to determine the chemical properties of stick ends
7. Bonus: Provide basic user-friendly interface.

## 5. What are DNAmazing features and how are they compared to other developed CAD programs?

• Lithography-like Inputting
• "One-click" to results (Automatic generation of folding path, crossover positions, and staple sequence)
• Sticky ends generation
• Computational Toolkits
• User-friendly interface
• No programming required
• Free and Open Source

Be interested in how we develop the program, visit DNAmazingProcess

## 6. How can I run the DNAmazing?

It is easy to run and experience the program. Download it here.

## 7. What are current limitations and future developments?

Problems:

• The first version of DNAmazing may have long processing time to handle complicated or large structures.
• There may be some exceptional structures which may not handled properly
• The program is only used for 2D Origami
• Due to the limitation of time, the results of DNAmazing which have been compared to other CAD programs have not been tested in wet lab experiments.

Future developments:

• Parallel computing or improvement on the algorithm may be considered to reduce the processing time
• As the principles and layout has been developed for 2D, DNAmazing can be extended into the regime of 3D Origami with curved structures
• More rigid computational tools can be included to analyze the chemical properties of sticky ends.