# Nucleic acid structure

## General

- Different observed double helical structures are called A, A', B, α-B', β-B', C, C', C'', D, E, and Z
- The letters denote structural differences, the α and β are associated with packing differences, and primes indicate small variations

- the symmetries of the various double helices are represented with two numbers [math]\displaystyle{ N_m }[/math] (from crystallography nomenclature)
- N is the number of nucleotides to reach the exact same point along the helix axis
- m is the number of helical turns to reach the exact same point along the helix axis

- the
**axial rise**is the distance along helical axis between nucleotides- If all bases were coplanar and the pairs perpendicular to the helix axis, the rise should equal the van der Waals distance of 3.4 Å

- The
**pitch**is the distance along the helix axis for one complete helix turn- The pitch equals the number of nucleotides in one turn multipled by the axial rise

- The
**helix twist**is 360 divided by the number of nucleotides in one turn and is the rotation between neighboring nucleotides - The base-pair
**tilt**is when the base pair plane is not exactly perpendicular to the helical axis.- 0° is the plane perpendicular to the helical axis
- tilt is defined relative to looking at the base pair plane from the 1'-C/N linkage side. Tilting this plane clockwise is positive tilt and negative tilt is counterclockwise
- There is a linear relationship between the tilt of an individual base with the axial rise per nucleotide

**Sugar puckering**is the deviation from planarity for the 5 atoms of the sugar ring. The 5 atoms are never seen to be planar. It can be in an envelope form where 4 atoms are in a plane and the fifth is out by 0.5Å or in a twist form where two adjacent atoms are out of the plane made by the other three atoms. Atoms on the same side as the 5'-C are called endo and those on the opposite side are called exo.- The
**minor groove**is the side of the base pair where the sugars are attached (C1') and the major groove is the other side- The width of either groove is the shortest distance between phosphates across the grove minus 5.8 Å (the sum of the van der Waals radii of the two phosphates)

- The
**x-displacement**(dx) is perpendicular distance from the long axis of the base pair to the helix axis- It is positive if the helix axis passes by the major groove of the base pair and negative if it passes by the minor groove

## DNA

DNA can form a wide range of double helical structures. Except for the left-handed S/Z helices, the structures are broadly classified into A and B families. The essential distinction between A and B type helices is in the sugar puckering. In A helices, C3'-endo sugar puckering is seen and in B-type helices, C2'-endo is seen. This leads to differences in distance between phosphates from 5.9Å in A-type to 7.0Å in B-type helices.

Random sequences are found in the A, B, and C forms. Designed repetitive sequences can form D, E, and Z forms. Z-form DNA is left-handed and occurs with alternating purine-pyrimidine sequences, mainly GC. Native DNA adopts the right-handed B-form with 10 base-pairs per turn in its crystalline state. But in solution, the molecule underwinds, yielding 10.3-10.6 base pairs per turn. The roll and tilt angles vary by a few degrees depending on the basepairs. The dinucleotide AA (or TT) causes significant variations in the roll and tilt angles.

In A-type double helices, the axial rise can vary from 2.59 to 3.29 Å but has small variation in rotation from 30.0° to 32.7°. In B-type helices, the axial rise only changes from 3.03 to 3.37 Å but the rotation varies from 36° to 45°.

In B-DNA, the x-displacement of the bases is slightly into the minor groove, but the bases are basically on the helix axis. In A-DNA, the helix axis is pushed far into the major groove. This is one parameter for which the values do not overlap for the A, B, and Z types so is a good parameter for defining a helix family.

Typical parameters for DNA helices:

Structure | Pitch (Å) | Helical symmetry | Axial rise (Å) | Twist (°) | Bases/turn | base tilt (°) | x-displacement (Å) | Minor groove width (Å) | Major groove width (Å) | Minor groove depth (Å) | Major groove depth (Å) |
---|---|---|---|---|---|---|---|---|---|---|---|

A | 32 | [math]\displaystyle{ 11_1 }[/math] | 2.56 | 32.7 | 11 | 12 | 4.1 | 11.0 | 2.7 | 2.8 | 13.5 |

B | 33.8 | [math]\displaystyle{ 10_1 }[/math] | 3.38 | 36.0 | 10 | 2.4 | -0.8 | 5.7 | 11.7 | 7.5 | 8.5 |

C | 31.0 | [math]\displaystyle{ 9.33_1 }[/math] | 3.32 | 38.6 | 9.3 | 4.8 | 10.5 | 7.9 | 7.5 | ||

B' | 32.9 | [math]\displaystyle{ 10_1 }[/math] | 3.29 | 36 | 10 | ||||||

C' | 29.5 | [math]\displaystyle{ 9_1 }[/math] | 3.28 | 40 | 9 | ||||||

C | 29.1 | [math]\displaystyle{ 9_1 }[/math] | 3.23 | 40 | 9 | ||||||

D | 24.3 | [math]\displaystyle{ 8_1 }[/math] | 3.04 | 45 | 8 | 1.3 | 8.9 | 6.7 | 5.8 | ||

E | 24.35 | [math]\displaystyle{ 7.5_1 }[/math] | 3.25 | 48 | 7.5 | ||||||

S | 43.4 | [math]\displaystyle{ 6_5 }[/math] | 3.63 | -30.0 | 12 | ||||||

Z | 45 | [math]\displaystyle{ 6_5 }[/math] | 3.9,3.5 | -10, -50 | 12 | -6.2 | -3.0 | 4 | 9 | convex |

## RNA

The extra 2'-OH usually prevents formation of the B-form helix found in DNA. Double-helical RNA is usually of the A or A' form. At low ionic strength, the A-RNA double helix dominates, but with > 20% salt, A'-RNA is formed. Both are right-handed antiparallel helices. Some key parameters and differences are:

Type | Pitch (Å) | Bases/turn | Axial rise (Å) | Base-pair tilt |
---|---|---|---|---|

A-type | 30 | 11 | 2.79 | 16.7° |

A'-type | 36 | 12 | 3.0 | 10° |

### Thermodynamics

[math]\displaystyle{ \Delta G^0 = -RT ln K = \Delta H^0 - T\cdot\Delta S^0 }[/math] where [math]\displaystyle{ K=\frac{\rm [duplex]}{\rm [single-strand]^2} }[/math]

At the melting temperature, [math]\displaystyle{ T_m }[/math], [math]\displaystyle{ 2[{\rm duplex}] = [{\rm single-strand}] }[/math] and from conservation of total RNA, [math]\displaystyle{ 2[{\rm duplex}] + [{\rm single-strand}] = [{\rm RNA}]_{total} }[/math]. From this, we can derive that:

[math]\displaystyle{ T_m = \frac{\Delta H^0}{\Delta S^0 + R\cdot ln[{\rm RNA}]_{total}} }[/math]

You can experimentally find the melting curve and extract the values of [math]\displaystyle{ \Delta H^0 }[/math] and [math]\displaystyle{ \Delta S^0 }[/math] from which you can get [math]\displaystyle{ \Delta G^0 }[/math]. The Freier-Turner rules shows the incremental [math]\displaystyle{ \Delta G^0 }[/math] of stacking another basepair to the end of another pair. The top row shows the 5' basepair, the left column shows the 3' basepair, and the values are in kcal/mol. For example, a GC basepair followed by a CG basepair has -3.4 kcal/mol. This data was calculated for the folding of RNA at 37°C.

GU | UG | AU | UA | CG | GC | |

GU | -0.5 | -0.6 | -0.5 | -0.7 | -1.5 | -1.3 |

UG | -0.5 | -0.5 | -0.7 | -0.5 | -1.5 | -0.9 |

AU | -0.5 | -0.7 | -0.9 | -1.1 | -1.8 | -2.3 |

UA | -0.7 | -0.5 | -0.9 | -0.9 | -1.7 | -2.1 |

CG | -1.9 | -1.3 | -2.1 | -2.3 | -2.9 | -3.4 |

GC | -1.5 | -1.5 | -1.7 | -1.8 | -2.0 | -2.9 |

To calculate the total energy of a RNA duplex, simply sum the contribution of each pair plus a nucleation term for the first pair, which has been experimentally determined to be 3.4 kcal/mol. It's positive because of entropic loss due to association of two strands.

Loops can be analyzed similarly. The Freier and Turner values for loops are:

Length | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 12 | 14 | 16 | 18 | 20 | 25 | 30 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Bulges | 3.3 | 5.2 | 6.0 | 6.7 | 7.4 | 8.2 | 9.1 | 10.0 | 10.5 | 11.0 | 11.8 | 12.5 | 13.0 | 13.6 | 14.0 | 15.0 | 15.8 |

Hairpin loops | ∞ | ∞ | 7.4 | 5.9 | 4.4 | 4.3 | 4.1 | 4.1 | 4.2 | 4.3 | 4.9 | 5.6 | 6.1 | 6.7 | 7.1 | 8.1 | 8.9 |

Internal loops | -- | 0.8 | 1.3 | 1.7 | 2.1 | 2.5 | 2.6 | 2.8 | 3.1 | 3.6 | 4.4 | 5.1 | 5.6 | 6.2 | 6.6 | 7.6 | 8.4 |

Bulge loops are when there are unpaired nucleotides only on one strand of a helix. For natural RNA, most bulges are of one nucleotide. Thermodynamic analysis shows that bulges destabilize the helix with little dependence on the identity of the bulged base.

Internal loops are when there are unpaired nucleotides on both strands. The size of an internal loop is the total number of unpaired nucleotides. Asymmetric internal loops where the number of unpaired nucleotides is different for the two strands is less stable than symmetric internal loops.

A hairpin loop forms when a single strand folds back on itself to make a helical stem. The stability depends on the loop size, the loop sequence, and the pair closing the loop. Some 4 base hairpin loops (tetraloops) are more stable than would be predicted. These include the sequences GNRA, UNCG, and CUYG.

Loops larger than 12 nucleotides are rarely seen in natural structures.

Dangling ends (unpaired nucleotides) at the ends of double helixes can stabilize a double helix depending on sequence. The first unpaired nucleotide 5' to a helix has little effect on the free energy whereas 3' dangling ends can stabilize a helix from -0.1 to -1.7 kcal/mol (i.e. more than some base pairs). 3' dangling purines add more stability than 3' dangling pyrimidines.

### Pseudoknots

RNA is normally assumed by folding algorithms to fold without pseudoknots. A non-pseudoknotted structure in parenthesis format would close all parenthesis in order, i.e. `[()]`. A pseudoknot has the form `[(])`. In a pseudoknot, the knotted region the "`()`" pairing cannot exceed 9 or 10 basepairs. This constraint is because of the helical structure of RNA which forms 10 or 11 basepairs per turn. With a full turn, the two strands of the pseudoknot would form a true knot which is physically and biologically unrealistic.

### Duplex formation

A model for helix formation is that the rate limiting step is the formation of a small nucleus containing a small number of base pairs. Then additional pairs are made. Data shows a nuclei of around 5±1 for only AU pairs and 2±1 for oligos with at least 2 GC pairs. There is roughly [math]\displaystyle{ 10^7 }[/math]/s rate for addition of the next base pair to the nucleus.

One can estimate pairing. At high sodium concentration or with magnesium, the forward rate for duplex formation is almost always around [math]\displaystyle{ k_{on} = 10^6 M^{-1}s^{-1} }[/math]. The [math]\displaystyle{ \Delta G^0 }[/math] can be estimated as above and the equilibrium constant determined via [math]\displaystyle{ K=e^{-\Delta G^0/RT} }[/math] from which the off rate can be determined [math]\displaystyle{ k_{off} = k_{on}/K }[/math]. From this, the half-life for helix dissociation can be predicted [math]\displaystyle{ t_{1/2} = ln(2)/k_{off} }[/math]

### Triple helices

Purines have a second face (the Hoogsteen face) that can hydrogen bond with a pyrimidine (A with U and G with C). In Hoogsteen pariing, the two strands are parallel. In reverse Hoogsteen pairing, the two strands are antiparallel. When one strand of a Watson-Crick paired helix contains a *homopurine region*, it can make Hoogsteen or reverse Hoogsteen pairing with a third homopyrimidine strand inserted into the major groove of the duplex to form a triple helix.

### Tetraloop-receptor interactions

Tetraloops of the GNRA family can interact with specific helical structures. Different loops interact with different receptors.

GNAA interacts with two consecutive C:G pairs (5'-CC:GG) and GNGA interacts with 5'-CU:GA. The last base of the loop (always A) binds to the second G of the helix and the third residue of the loop makes a G:A base pair. GAAA binds the 11 nt motif [CCUAAG...UAUGG] extremely well (over 31x better than other interactions) (PMID 7720718). This motif is very common in group I and group II introns.