Guide to read this report efficiently
DNA Origami has emerged as one of the most promising tools for DNA self-assembly nanostructures. The techniques has been coined by Ruthermund in 2006 and extensively exploited. Thank to the efforts of scientists in the field, DNA origami now can enable the building of custom-shaped DNA nanostructures with a high degree of accuracy and complexity. DNA Origami structures have been used to study single-molecular chemical reactions, to assemble water-soluble probe tiles for label-free RNA hybridization.
DNA Origami future applications and developments
Potential applications of DNA Origami in the future are the modeling complex protein assemblies, the building of molecular electronics or plasmonic circuits created on the DNA Origami boards, or even molecular factories with nanomachines operated on complex networks constructed by DNA Origami technique. Some of these applications may require the connection between DNA structures and other novel materials or devices such as carbon nanotubes, nanowires, gold nanoparticles, DNA nanomachines, etc. Sticky ends have been suggested as the best solution so far. Sticky ends are double helix DNAs with one longer single strand containing unpaired base pairs. The unpaired base pairs of sticky ends can act as a link between DNA Origami and other DNA or biological devices through the hydrogen bonds of complementary base pairs. In addition, the ends of sticky strands can be functionalized with chemical groups.
Another requirement for future applications of DNA Origami is the design complexity. The application of DNA Origami for the molds of large scale nanoelectronic circuits and platforms for DNA machines definitely demand a software to reduce the tedious process of building the scaffold, determining the crossover positions.
Traditional CAD programs for DNA Origami
Since the introduction of DNA Origami to the field of biomolecular design, there has been some good software dedicated to the design of both two dimensional and three dimensional DNA Origami structures. Most of the software has GUI which allows users to define the DNA Origami structures by manually drawing the scaffold ways. This process may be too tedious for large and complex structures, which will inevitably be studied and exploited in the near future. Furthermore, we have noticed that there have been very few programs which are able to provide a full function of the addition of sticky ends, which certainly limit the DNA Origami technique’s fields of application. Furthermore, we found out that there has been a lack of fully documented instructions about how a program in DNA Origami design is developed. Such an instruction may be useful as a platform to further develop design tools in DNA Origami. As a result of these observations, we decided to develop a new program from the scratch to overcome these limits.
The program developed in this project has following functions:
- Providing all the possible scaffold ways for a given structures. A filter will be implemented to select the best choices.
- Automatically determining the positions of crossover and allowing the addition of sticky ends.
- Estimating the thermal stability of sticky end by calculating its enthalpy, entropy, free Gibbs energy, and melting temperature of its duplex form.
The Project's Description
The Overview of DNAmazing Program
DNAmazing program consists of three main modules: Design, Chemistry, and Graphic User Interface (GUI). The programming language is C# and the integrated development environment (IDE)is Microsoft Visual Studio 2010.
The Design is the backbone of the program which provides basic functions of a DNA Origami design tool: receiving information about the DNA Origami structures (shapes, sticky ends) and returning the necessary information of staple and scaffold sequences to synthesize the structures in labs.
The Chemistry: implementing Nearest-Neighbor method, thermal stability of DNA sticky end is estimated by calculating ΔHo, ΔSo, ΔGo, and melting temperature Tm. In addition, these thermodynamic values assist in the sticky end generation process by ensuring that proposed sticky end is not stably bound to the scaffold strand.
The GUI allows users work on the Design and Chemistry in a user-friendly, interactive interface.
Basic Dogmas of Design
The Design of 2D DNA Origami in DNAmazing follows the principles which were laid out in Rothermund's first paper in 2006. The basic idea of DNA Origami is to fold a DNA helix into a desired shape. The DNA helix in DNA origami is made from 2 different single strands: a single, long, and continuous strand making up the pattern of the origami is called the scaffold strand; the second strand made from the continuous connection of several short, single DNA fragments, or staple strands. The staple strands are bound to the scaffold via the complementary base-pair interaction, constructing the structure of DNA origami. In addition, the staple stands form crossovers to keep the scaffold strand in the desired shape.
For designing purpose, the desired DNA origami is conceptually divided into several small helical segments, in which one segment is equivalent to one turn of the folded helix. Each segment is represented as one square shown in the program, and is numbered for designing convenience. The labeling is done from left to right, and from bottom row to the top. The non-integer number of bases pair per turn: 10.67 will be approximated as 11 base pairs (reference?). The DNA helix is folded by forming several crossovers in the staple strands; these crossover indicates the position where a staple strand switches to binding to another segment of the scaffold strand. These crossover positions only occur at locations where DNA twist places at its tangent point between helices which is apart by any odd number of half-turns. In this project, we will stick to 1.5 turns (Rothermund, 2006).
Throughout the software, there are 2 coordinate systems which are used to refer a specific square in the DNA Origami structures. The labeling mentioned in this part is the matrix coordinate. The other is the scaffold coordinate which will be described later in 6.2.3 Generation of scaffolding pathways.
Recognizing the fact that the conventional input of existing programs may not be convenient for large and complex structures,DNAmazing adopts a very different way: a lithography-like way.Instead of drawing the scaffold way, which may be painful and even impossible for complicated designs,users will input the dimensions of a rectangle that encloses their desired structure. The dimensional units are the number of helices/squares per row and per column. The users will achieve their final desired shape by eliminate the unwanted squares. The elimination id done by inputting the number of the unwanted squares (null squares).
In the above example, the desired DNA Origami shape is enclosed by a rectangular frame 6 squares x 6 squares. There are totally 8 null squares: 12,18,,24,30,17,23,29,35.
Generation of scaffolding pathways
One of the unique features of DNAmazing is its ability to automatically generate the scaffolding pathways. For the existing programs, users have to manually design the folding of the scaffold into the desired shapes. This process is tedious for complex structures such as smiling faces in Rothermund's paper. In DNAmazing, users only have to conceptualize the DNA Origami structures into series of squares which was described in the previous part. This is definitely less time-consuming and much easier.
Basically, the process of generation of scaffold folding pathways is to thread the scaffold strand to all the squares, and each square is visited only once. This is very similar to the algorithm of the Hamiltonian circuit (or the Hamiltonian path). In the graph theory, a Hamiltonian circuit is a path in an undirected graph that visits each vertex exactly one. Another example of Hamiltonian circuit is the problem of a business man to visit all the cities only once to deliver goods.
Each square in DNA Origami is modeled as a vertex which can be linked to its 4 adjacent neighbors in 4 directions, but not diagonal neighbors. The null squares are isolated squares and there is not be any links to them, and "null" square is not included in the folding pathway. The scaffolding path starts with the first square and extend by adding one of 4 neighbors of the first square. A Hamiltonian circuit can be solved by exploring all the possible paths that satisfy the condition.The process is repeated until it can no longer extend because of there are not any possible choices or the path has passed through all squares. If the latter happens, the process is done and a scaffolding pathway is successfully generated. In the former cases, the program will take one step back and explore other choices.
By using the algorithm of the Hamiltonian circuit, DNAmazing is able to find all the possible scaffold pathways. However, not all of these ways are reasonable for the DNA Origami. A filter must be included to select the paths which are suitable for DNA Origami. Below are some rules which we use in the filtering process:
- The first square is either square 0 or the square at the middle of the first row
- If the first square is square 0, the scaffold should run continuously and only turn over to the next row at either of the two ends. If the first square is the middle one of the first row, the last square in the scaffold way must be on the right of the first square.
- The scaffold should not run in the vertical direction.
The result of this stage is a 1D matrix containing the ordinal numbers of the squares that the scaffold passes through. For instance,the scaffold way in the above figure will be presented as C=[2,1,0,6,7,8,14,13,19,20,26,25,31,32,33,34,28,27,21,22,16,15,9,10,11,5,4,3]
Determination of crossover positions
The next step in the Design part is the determination of crossover positions. Crossovers are the positions where the staple strands switch to bind to another segment of the scaffold located on the next row. The crossovers are crucial to the folding of the scaffold strand; in fact, they are the only linkages which prevent the helix from unfolding. The basic principle to determine the positions of crossovers was laid out by Rothermund: the spacing between crossovers in 2D DNA Origami structures must be an odd number of half turn. In other words, 2 vertically adjacent staples meet at their tangent points every an odd number of half turn. Thus, the staples will be in the least strained state at the crossovers. Particularly, in this project, we will stick to 1.5 turns as the unit for the spacing of crossovers.
The algorithm to determine the crossover positions starts with the generation of an ArrayList, which is a matrix with flexible dimensions. We named it PosCros. The Poscros Arraylist is used to add the squares which contain the crossover position. The first element of PosCros is always the first element in the scaffolding pathway. The next elements are determined based on the category of the previous element; the categorization is done based on the relative distance between the element and the closest turning point of the scaffold.
Generation of Sticky ends' Sequence
Sticky ends serving as an extra ends of a staples should not interfere with the scaffold folding, and must not bind to the scaffold, in the formation of DNA Origami. So, sticky end sequences must not have any stable binding to any segment of the scaffold. To generate sticky end sequences, DNA sequences of a defined length are generated randomly. The newly generated sequences are then to be examined for its binding ability via the binding free energy. Sequences which have a rather stable binding with any position in the scaffold are discarded. Only those without stable scaffold binding are kept and can be used as sticky ends’ sequences.
To determine whether the sticky end would have any stable binding to the scaffold, one needs to know binding energy of the sticky end to every sequence in the scaffold. In addition, a threshold below which the binding is considered stable is also required.
Calculation of binding energy
The Sticky end sequence given is mapped along the scaffold length and the binding energy ΔGo is calculated for each match/mismatch binding. The calculation was done using the formula and complete thermodynamic database for internal single mismatches discussed in SantaLucia’s studies (2006) (1). The formula and parameters are shown bellow:
Nearest-neighbor ΔGo increments (kcal/mol) for internal single mismatches next to Watson-Crick pairs in 1 M NaCl
For example, consider the total binding energy of following DNA duplex. The mismatch base pair is bold:
In our program, ΔGsym and ΔGini were not included in the ΔG calculation as sticky ends are merged with the staple sequences
Set up a threshold
To determine if the mismatched complement between the sticky ends and the scaffolds are stable or unstable, a threshold of binding energy (ΔGo) is required. Binding energy less than or equal to this threshold would be consider stable. There should not be an absolute threshold value for every DNA sequence with different length. Longer DNA sequences require lower ΔGo for a stable binding. Therefore, the threshold is set up as a variable calculated based on the sequence length: Let n be the length of DNA sequence. If n is even, the threshold is calculated as follow:
if n is odd, then the formula to calculate the threshold is:
In the above equation, -0.58 is the binding energy between 5’-TA-3’/3’-AT-5’, and -0.88 is the binding energy between 5’-AT-3’/3’-TA-5’. This means that the right side of the equation equal the binding energy of the complement 5’-(AT)n-3’/3’-(TA)n-5’of the same length. In other words, there would be no binding between the sticky end and scaffold which is more stable than the least stable Watson-Crick fully complemented DNA duplex
Generation of staples sequence
This final step provides the most meaningful and necessary information for users, the staples sequence. Together with the scaffold sequence, the staple sequence is the most important parameter to generate DNA Origami structures in wet lab experiments. The program firstly obtains the sequence of the scaffold strand extracted from the genome of Mp13 virus. This can be done by calculating the number of squares in the DNA Origami=>number of base pairs=> get from the beginning until that point. The sequence of the complementary strand to scaffold strand will be determine after that. This complementary strand is divided into smaller fragments. Each fragment’s length is 11 base pairs. For the purpose of merging, each fragment is cut into 2 smaller fragments: one contain 6 base pairs and the other 5 base pairs. Thus we have 2 types of staples: A with 6 base pairs and B with 5 base pairs. Basically, A is always at the 3’ end while B is at 5’ end. The next step is to append the sticky ends into the staples. The location of appending can be either the beginning or the end of the square. A merging process of staples (which includes sticky ends) to form the crossover must be done
The prediction of the thermal stability of the duplex produced from sticky end
Predict the thermal stability of short DNA duplex which is formed upon the binding of the sticky end and its complementary single-stranded strand.
The capability to estimate the thermal stability will aid in numerous applications such as (i) predicting the stability of a local sequence on DNA duplex, or of a probe-gene complex, (ii) calculating the melting temperatures of short sequences in hybridization experiments, (iii) determining the optimal length of the probe oligomer to produce stable duplexes with the sticky ends. Recently, the order-disorder transition of a sticky end with its complementary single strand is also important in controlling the dynamic movement of nanomotors, which are made from DNA strands (reference?).
Research has shown that the thermal stability of duplex is affected by sequence information and base compositions. However, the sequence of DNA strand is the major determinant of ΔHo, ΔSo, and ΔGo. We apply the nearest-neighbor (NN) method to determine the transition enthalpy, entropy, free energy, and melting point of short DNA duplex. This method calculates those thermodynamic values using the stacking interaction between Watson-Crick neighboring bases in the DNA strands.
DNAmazing program will not only assist in random generating stick ends attached to pre-determined positions on DNA Origami, but also allow users to input their preferred sequence information of the sticky ends. Since different sequences have different thermal stability (represented by ΔHo, ΔSo, and ΔGo) upon binding, knowing those thermodynamic values is crucial to study the function and applications of the sticky ends.
Besides, DNAmazing program also helps to determine whether the sticky end's sequence input by user is complementary to the scaffold strand or other staple strands.
There are many groups have dedicated researching on NN method to determine ΔHo, ΔSo, ΔGo, and Tm of short DNA oligomers and have arrived on the same formula as demonstrated below. However, since difference researches used different starting materials (short DNA oligomers, polymers, etc.), the values for one parameter slightly vary. We have chosen the latest results obtained by John S.L. et al to incorporate into our software.
Where is the helix initiation enthalpy of the transition process; is the symmetry term only applies to self-complementary duplexes, accounting for the enthalpy difference between a duplex formed from a self-complementary sequence and a duplex formed from 2 complementary strands; is applied for each end of a duplex that has a terminal AT, accounting for the end-fraying caused by AT base pair; is the total of enthalpy of propagation step in the sequence.
ΔSo, ΔGo are calculated using the same formula (1) above.
There are 10 propagation steps, 1 initiation, and 1 terminal AT correction to make up a total of 12 NN parameters shown in Table 1. These values are obtained via multiple linear regression of the results from differential scanning calorimetry (DSC) of 108 short DNA sequences.
|Propagation step||ΔHo (kcal/mol)||ΔSo (e.u.)||ΔGo (kcal/mol)|
|Terminal AT penalty||+2.2||+6.9||+0.05|
The melting point of short DNA chain, defined as the temperature at which half of double-stranded DNA sequences have dissociated, is calculated as following:
where Ct is the total molar strand concentration. For nonself-complementary duplexes x=4, and for self-complementary, x=1.
NN method is just an approximation because it neglects the secondary interactions in the DNA duplexes (we assume that the DNA duplexes undergo two-state transition), and the heat capacity Cp is constant over different temperatures. To reduce such inaccuracy in calculation, short DNA oligomers (less than 30 base pairs) were used to minimize the secondary interaction within the DNA molecule.
Sodium dependence of ΔSo and ΔGo
The entropy and free energy calculated from formula (1) above apply at 37oC and 1M NaCl. To extend the results to various salt condition, the following correction formulae have been derived by (***)
where N is the total number of phosphate in the duplex and [Na^+] is the total concentration of monovalent cations (Na + , K + , NH4 + ) in the solution. ΔHo is assumed to be sodium-independent.
To calculate the value of ΔGo at temperature different than 37oC , the following equation is used:
ΔGo = ΔHo − TΔSo
in which T is in Kelvin, ΔHo is in cal/mol, and ΔSo is in entropy units (e.u.). ΔHo and ΔSo are assumed to be independent of temperature.
The User Interface (GUI)
GUI or graphic user interface is constructed to create a friendly environment for users to design their DNA origami. Our GUI is generated using Window form application in Visual studio 2010. Our software has three main tools to support the DNA Origami design with sticky end, to design, and to analyze the thermaldynamic properties of sticky ends. The code sources are provided in the attachments.
For the first component, staples’ sequences used for the correct folding of DNA Origami with sticky ends are generated. User are required to define the size and shape of the structures they want to design by first input the frame size, and then choose the null squares (the location which will not be occupied by the scaffold). This would help the program to understand the DNA Origami design.
After obtaining the parameters required, the program will generate different possible scaffold ways and ask users to choose one of their interest.
Users can also choose to add sticky end by enter the number of sticky end they need and specify the sequence and location of sticky ends in the scaffold.
Final staple sequences are generated and appear in the result window.
Generate sticky end sequence
To support generation of sticky end, as well as, to ensure that the sticky end will not affect the scaffold folding, an additional component is provided. User can choose to manually input a DNA sequence, and the program can help to check for the most stabilizing binding position in the scaffold. The binding energy is also calculated for users’ reference.
User can also ask the program to generate the sticky end sequence with the defined length. DNA sequences with binding energy higher than a limit defined are given. The below image illustrates the output of sticky ends' sequence generation.
The other component of the software is also to support the sticky end analysis in which thermal dynamic values of the sequence are calculated. Users need to enter the sequence they want to analyze, together with the condition in which they would test the DNA (total DNA strand concentration, Na+ concentration, and melting temperatures). Thermaldynamics value including deltaG, deltaS, deltaH, and Tm are provided in the results pages.
Results and Discussion
1. SantaLucia, J. and D. Hicks (2004). "THE THERMODYNAMICS OF DNA STRUCTURAL MOTIFS." Annual Review of Biophysics and Biomolecular Structure 33(1): 415-440.
2. SantaLucia, J., H. T. Allawi, et al. (1996). "Improved Nearest-Neighbor Parameters for Predicting DNA Duplex Stability†." Biochemistry 35(11): 3555-3562.
3. Breslauer, K. J., R. Frank, et al. (1986). "Predicting DNA duplex stability from the base sequence." Proceedings of the National Academy of Sciences 83(11): 3746-3750.
4. Marky, L. A. and K. J. Breslauer (1982). "Calorimetric determination of base-stacking enthalpies in double-helical DNA molecules." Biopolymers 21(11): 2185-2194.
5. Sugimoto, N., S.-i. Nakano, et al. (1996). "Improved Thermodynamic Parameters and Helix Initiation Factor to Predict Stability of DNA Duplexes." Nucleic Acids Research 24(22): 4501-4505.