Wikiomics:RNA secondary structure prediction

From OpenWetWare
Jump to navigationJump to search

Single sequence structure prediction

A common problem for researchers working with RNA is to determine the three-dimensional structure of the molecule. However, in the case of RNA much of the final structure is determined by the secondary structure or intra-molecular base-pairing interactions of the molecule. This is shown by the high conservation of base-pair across diverse species.

One of the first attempts to predict RNA secondary structure was made by Ruth Nussinov and co-workers who used dynamic programming method for maximising the number of base-pairs [1]. However, there are several issues with this approach, most importantly the solution is not unique. Nussinov et al published an adaptation of their approach to use a simple nearest-neighbour energy model in 1980 [2]. Michael Zuker and Patrick Stiegler in 1981 proposed using a slightly refined dynamic programming approach that models nearest neighbour energy interactions that directly incorporates stacking into the prediction [3]. The energies that are minimized by the recursion are derived from empirical calorimetric experiments, the most up-to-date parameters were published in 1999 [4]. There has been recent progress in estimating "energy" parameters directly from known structures[5]. Another approach researchers are using is to sample structures from the Boltzmann ensemble [6, 7].

One of the issues when predicting RNA secondary structure is that the standard recursions (eg. Nussinov/Zuker-Stiegler) exclude pseudoknots. Elena Rivas and Sean Eddy published a dynamic programming algorithm that could handle pseudoknots [8]. However, the time and memory requirements of the method are prohibitive. This has prompted several researches to implement versions of the algorithm that restrict the classes of pseudoknots, resulting in gains in performance.

Another issue when predicting RNA secondary structure is the absence of the non-canonical base pairing interactions, such as AA, AC, AG, CC, CU, etc. In order to remedy this problem, Parisien and Major proposed a radically different approach where 19 fundamental nucleotide cyclic motifs are systematically assigned and scored, rather than the classical tandems of base pairs [9].

Error creating thumbnail: Unable to save thumbnail to destination
S. cerevisiae tRNA-PHE structure space: the energies and structures were calculated using RNAsubopt and the structure distances computed using RNAdistance.

References

  1. Nussinov R, Piecznik G, Grigg JR and Kleitman DJ (1978) Algorithms for loop matchings. SIAM Journal on Applied Mathematics.

    [NUSS78]
  2. Nussinov R and Jacobson AB. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci U S A. 1980 Nov;77(11):6309-13. DOI:10.1073/pnas.77.11.6309 | PubMed ID:6161375 | HubMed [NUSS80]
  3. Zuker M and Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981 Jan 10;9(1):133-48. DOI:10.1093/nar/9.1.133 | PubMed ID:6163133 | HubMed [ZUKE81]
  4. Mathews DH, Sabina J, Zuker M, and Turner DH. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999 May 21;288(5):911-40. DOI:10.1006/jmbi.1999.2700 | PubMed ID:10329189 | HubMed [MATH99]
  5. Do CB, Woods DA, and Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics. 2006 Jul 15;22(14):e90-8. DOI:10.1093/bioinformatics/btl246 | PubMed ID:16873527 | HubMed [DO06]
  6. McCaskill JS. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers. 1990 May-Jun;29(6-7):1105-19. DOI:10.1002/bip.360290621 | PubMed ID:1695107 | HubMed [MCCA90]
  7. Ding Y and Lawrence CE. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 2003 Dec 15;31(24):7280-301. DOI:10.1093/nar/gkg938 | PubMed ID:14654704 | HubMed [DING03]
  8. Rivas E and Eddy SR. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol. 1999 Feb 5;285(5):2053-68. DOI:10.1006/jmbi.1998.2436 | PubMed ID:9925784 | HubMed [RIVA99]
  9. Parisien M and Major F. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature. 2008 Mar 6;452(7183):51-5. DOI:10.1038/nature06684 | PubMed ID:18322526 | HubMed [PARI08]

All Medline abstracts: PubMed | HubMed

External links to RNA folding software

  • Afold Analysis of internal loops within the RNA secondary structure in almost quadratic time
  • alteRNA: RNA Density Fold. Minimizes a linear combination of energy density and the total free energy for a given RNA sequence.
  • CONTRAfold a secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring.
  • Kinfold Simulates the stochastic folding kinetics of RNA sequences into secondary structures. The algorithm operates on the basis of the formation, dissociation, and the shifting of individual base pairs.
  • Mfold MFE RNA structure prediction algorithm.
  • RDfolder RNA folding by energy weighted Monte Carlo simulation.
  • RNAfold MFE RNA structure prediction algorithm.
  • RNA Kinetics models the dynamics of RNA secondary structure by the means of kinetic analysis of folding transitions of a growing RNA molecule. The result of the modeling is a kinetic ensemble, i.e. a collection of RNA structures that are endowed with probabilities, which depend on time. This approach gives comprehensive probabilistic description of RNA folding pathways, revealing important kinetic details that are not captured by the traditional structure prediction methods.
  • RNAstructure A Windows implementation of the Zuker algorithm for RNA secondary structure prediction based on free energy minimization. Includes a sequence editor, an integrated drawing tool, the OligoWalk program, OligoScreen, Dynalign, and can compute the partition function.
  • SARNA-Predict A heuristic algorithm based on Simulated Annealing for RNA secondary structure prediction.
  • Sfold Statistical sampling of all possible structures. The sampling is weighted by partition function probabilities.
  • Vsfold4 folds single RNA sequences using an extended energy model.
  • MC-Fold folds single RNA sequences producing secondary structures that include Watson-Crick and non-canonical base pairs.

with pseudoknots

  • HotKnots A heuristic algorithm for the prediction of RNA secondary structures including pseudoknots.
  • HPknotter A Heuristic Approach for Detecting RNA H-type Pseudoknots.
  • KineFold Folding kinetics of RNA sequences including pseudoknots.
  • McQFold MCMC-sampling secondary structures with pseudoknots for a given RNA sequence.
  • NUPACK A dynamic programming algorithm based on the partition function for the prediction of a restricted class of RNA pseudoknots. Uses less resources than Pknots-SE.
  • Pknots-RG A dynamic programming algorithm for the prediction of a restricted class of RNA pseudoknots. Uses the least resources to date, conversely infers the most restrictive class of knotted structures.
  • Pknots-SE A dynamic programming algorithm for optimal RNA pseudoknot prediction using the nearest neighbour energy model.
  • PLMM-DPSS High sensitivity RNA pseudoknot prediction using "Pseudoknot Local Motif Model and Dynamic Partner Sequence Stacking".
  • SARNA-Predict-pk A heuristic algorithm based on Simulated Annealing for RNA secondary structure prediction including pseudoknots.
  • MC-Fold includes an option for H-type pseudo-knots.

with suboptimal predictions

  • RNAsubopt Reads RNA sequences from stdin and calculates all suboptimal secondary structures within a user defined energy range above the minimum free energy (mfe).
  • Barriers Reads an energy sorted list of conformations of a landscape, and computes local minima and energy barriers of the landscape. For RNA secondary structures, suitable input is produced by RNAsubopt For each local minimum found it prints to stdout, the conformation of the minimum, its energy, the number of the "parent"-minimum it merges with, and the height of the energy barrier.
  • MPGAfold A massively parallel genetic algorithm for predicting RNA secondary structures with pseudoknots.
  • RNAshapes Unique suboptimal structures (shapes) are selected based on an abstract representation of RNA secondary structure which is inspired by the dot bracket representation known from the Vienna RNA package. The user can choose from 5 different types of shape resolution corresponding to different abstraction levels.
  • RNALOSS locally optimal secondary structure computation. RNALOSS computes the number of k-locally optimal secondary structures for the input RNA, along with relative density of states and minimum free energy of a sample k-locally optimal secondary structure.
  • MC-Fold produces all suboptimal secondary structures within a user defined energy percent above the minimum free energy (MFE).

Comparative structure prediction

Frequently researchers have more than one RNA sequence which they suspect are homologous (or analogous) and therefore may share a common structure. There are three main approaches to this problem: 1. alignment folding. 2. simultaneous alignment and folding. 3. aligning secondary structures.

For the alignment folding approach the mutual information content is frequently used [10]. Generally the alignment is inferred using one of the standard multiple alignment tools (eg. [ClustalW,] [MAFFT,] [MUSCLE]). Recently some SCFG based alignment folding algorihms have been developed, such as PFold [11]. Alternatively, a sum of energy and covariation terms can be used, such as RNAalifold [12].

References

  1. Parisien M and Major F. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature. 2008 Mar 6;452(7183):51-5. DOI:10.1038/nature06684 | PubMed ID:18322526 | HubMed [PARI08]
  2. Chiu DK and Kolodziejczak T. Inferring consensus structure from nucleic acid sequences. Comput Appl Biosci. 1991 Jul;7(3):347-52. DOI:10.1093/bioinformatics/7.3.347 | PubMed ID:1913217 | HubMed [CHIU91]
  3. Knudsen B and Hein J. Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 2003 Jul 1;31(13):3423-8. DOI:10.1093/nar/gkg614 | PubMed ID:12824339 | HubMed [KNUD03]
  4. Hofacker IL, Fekete M, and Stadler PF. Secondary structure prediction for aligned RNA sequences. J Mol Biol. 2002 Jun 21;319(5):1059-66. DOI:10.1016/S0022-2836(02)00308-X | PubMed ID:12079347 | HubMed [HOFA02]

All Medline abstracts: PubMed | HubMed

Structures from alignments

  • BayesFold Finds, ranks, and draws the likeliest structures for a sequence alignment. Foldings are based on the predictions of the Bayesian statistical method. Your browser must be Internet Explorer 5+ for Windows with the Adobe Scalable Vector Graphics Viewer plugin installed.
  • ConStruct A tool for thermodynamic controlled prediction of conserved secondary structure. Also allows iterative manual alignment refinement (new version).
  • GArna Prediction of secondary structures of RNAs by genetic algorithm.
  • GPRM finding common secondary structure elements, not a global alignment, in a sufficiently large family (e.g. more than 15 members) of unaligned RNA sequences.
  • Genebee RNA alignment folding using a combination of free-energy and mutual information.
  • Pfold Folds alignments using a SCFG trained on rRNA alignments. The alignment length limit is 500.
  • RNAalifold Folds alignments using a combination of free-energy and a covariation measure. Ships with the Vienna package. Also a web-server.
  • RNAlishapes is a tool for RNA structure analysis based on aligned RNAs.
  • RNA-Decoder and CORSmodel Programmes for comparative predictions of RNA secondary structure in regions which are also protein coding.
  • RNAGA Prediction of common secondary structures of RNAs by genetic algorithm.
  • X2s An X windows program for analyzing and editing an alignment of RNA sequences and for predicting RNA secondary structure. The original server appears to be dead. Files provided c/o A. Wilm.

with pseudoknots

  • Circles Circles is an experimental Windows 95/98/NT program for inferring RNA secondary structure using the comparative method. It provides a user-friendly interface to Jack Tabaska's maximum weight matching programs. The user can read in an alignment in FASTA, Clustal, or NEXUS format, compute a maximum weight matching, and export one or more secondary structures in standard formats.
  • HXMATCH Hxmatch computes the consensus structure including pseudoknots based on an alignment of a few RNA sequences. The algorithm combines thermodynamic and covariation information to assign scores to all possible base pairs, the base pairs are chosen with the help of the maximum weighted matching algorithm.
  • ILM Iterated Loop Matching. Evaluates stems in an alignment using a combination of free-energy and mutual information. Iteratively selects high scoring stems.
  • KNetFold Computes a consensus RNA secondary structure from an RNA sequence alignment based on machine learning.
  • MIfold Matlab package for investigating mutual information content of RNA alignments.

Simultaneous alignment and structure prediction (Sankoff-like methods)

  • caRNAc Comparative analysis combined with MFE folding.
  • Consan Pairwise RNA structural alignment, both unconstrained and constrained on alignment pins. Consans uses a constrained version of a pairSCFG structural alignment algorithm which assumes knowledge of a few confidently aligned positions (pins). Pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment.
  • CMfinder An RNA motif prediction tool. It is an expectation maximization algorithm using covariance models for motif description, carefully crafted heuristics for effective motif search, and a novel Bayesian framework for structure prediction combining folding energy and sequence covariation. This tool performs well on unaligned sequences with long extraneous flanking regions, and in cases when the motif is only present in a subset of sequences. CMfinder also integrates directly with genome-scale homology search, and can be used for automatic refinement and expansion of RNA families.
  • COVE COVE is an implementation of stochastic context free grammar methods for RNA sequence/structure analysis.
  • Dynalign Uses a "full energy model" and comparative information to align and fold 2 sequences. Restricts the 'span' of base-pairs to improve CPU time.
  • PMmatch A variant of the Sankoff algorithm from the Vienna group.
  • Foldalign1 Predicts conserved local sequence and hair-pin structures using CONSENSUS and CLUSTAL-like heuristics. Primarily used to infer cis-regulatory elements.
  • Foldalign2 Structurally align two sequences using a light weight energy model in combination with RIBOSUM like score matrices.
  • RNAcast RNA consensus abstract shapes technique: an alternative to the Sankoff algorithm for multiple RNA folding.
  • RNAmine is a software tool to extract the structural motifs from a set of RNA sequences. The potential secondary structures of the RNA sequences are represented by directed graphs, and the common secondary structures are extracted by using graph mining technique.
  • scaRNA Stem Candidate Aligner for RNA: aligns two RNA sequences and calculates similarities, based upon estimated common secondary structures. Scarna is a fast, convenient tool for clustering RNA sequences and for similarity search in long sequences. It works even for pseudoknotted secondary structures. Currently www.scarna.org Only works with Internet Explorer.
  • SEED Suffix arrays are used enumerate complementary regions, possibly containing interior loops, as well for matching RNA secondary structure expressions.
  • Slash A combination of COVE and Foldalign1.
  • Stemloc Comparative RNA structure-finder using accelerated pairwise stochastic context-free grammars. Ships with the 'dart' package by Ian Holmes.
  • T-LARA Produce a global fold and alignment of ncRNA families using integer linear programming and Lagrangian relaxation.

with pseudoknots

  • comRNA comRNA predicts common RNA secondary structure motifs in a group of related sequences.

Aligning RNA structures

  • LGSFAligner Local Gapped Subforest Aligner, aligns two RNA secondary structures locally.
  • MARNA MARNA considers both primary sequence and the secondary structure to align RNAs. Based on pairwise comparisons using costs of edit operations. The edit operations can be divided into edit operations on arcs and edit operations on bases.
  • MiGaL Compare RNA secondary structures and build phylogenetic trees.
  • RNA_align Aligns two RNA structures using an edit distance model.
  • RNAdistance reads RNA secondary structures from stdin and calculates one or more measures for their dissimilarity, based on tree or string editing (alignment). In addition it calculates a "base pair distance" given by the number of base pairs present in one structure, but not the other.
  • RNAforester Compare and align RNA secondary structures via a "forest alignment" approach.
  • RNAshapes the "consensus shapes" method independently enumerates the near-optimal abstract shape space, and predicts as the consensus an abstract shape common to all sequences.
  • RSmatch provides a light-weight approach to compare RNA structures, thereby uncovering functional structure elements. Compared with other tools for RNA structure comparison, RSmatch is fast, requiring quadratic time determined by the sizes of two given structures.
  • MC-Cons assigns to each sequence one of its suboptimal predictions that globally optimizes the sum of pair-wise similarities, resulting in a global and structural consensus assignment (that may include more than one structures).

with pseudoknots

  • PSTAG Pair stochastic tree adjoining grammars (PSTAGs) for aligning and predicting RNA secondary structures including a simple type of pseudoknots which can represent most of known pseudoknot structures.

Miscellaneous

  • StrAl is an alignment tool designed to provide multiple alignments of non-coding RNAs following a fast progressive strategy. It combines the thermodynamic base pairing information derived from RNAfold calculations in the form of base pairing probability vectors with the information of the primary sequence.
  • MC-Sym produces RNA 3-D structures from an input script generated by MC-Fold (above).


See also