The RF Fold module is designed to allow transcriptome-wide reconstruction of RNA structures, starting from XML files generated using the RF Norm tool.
This tool can process a single, or an entire directory of XML files, and produces the inferred secondary structures (either in dot-bracket notation, or CT format) and their graphical representation (either in Postscript, or SVG format).
Folding inference can be performed using 2 different algorithms:
Prediction can be performed either on the whole transcript, or through a windowed approach (see next paragraph).
The windowed folding approach is based on the original method described in Siegfried et al., 2014 (PMID: 25028896), and consists of 3 main steps, outlined below:
In step I (optional), a window is slid along the RNA, and pseudoknotted structures are detected using the same approach employed by the ShapeKnots algorithm (Hajdin et al., 2013 (PMID: 23503844)). Our implementation of the ShapeKnots algorithm relies on the ViennaRNA package (instead of RNAstructure as the original implementation did), thus is much faster:
Nonetheless, both algorithms work in single thread. Alternatively, the multi-thread implementation
ShapeKnots-smp shipped with the latest RNAstructure version can be used.
If constraints from structure probing experiments are provided, these are incorporated in the form of soft-constraints. Predicted pseudoknotted base-pairs are retained if they apper in >50% of analyzed windows. In case constraints are provided, pseudoknots are retained only if the average reactivity of bases on both sides of the helices is below a certain reactivity cutoff.
In step II, a window is slid along the RNA, and partition function is calculated. If provided, soft-constraints are applied. If step I has been performed, pseudoknotted bases are hard-constrained to be single-stranded. Predicted base-pair probabilities are averaged across all windows in which they have appeared, and base-pairs with >99% probability are retained, and hard-constrained to be paired in step III.
In step III, a window is slid along the RNA, and MFE folding is performed, including (where present) soft-constraints from probing data, and hard-constraints from stages I and II. Predicted base-pairs are retained if they appear in >50% of analyzed windows.
At all stages, increased sampling is performed at the 5'/3'-ends to avoid end biases
At this stage, if step I has been peformed, pseudoknotted base-pairs are added back to the structure, and the free energy is computed. Along with the predicted structure, the windowed method also produces a WIGGLE track file containing per-base Shannon entropies.
Regions with higher Shannon entropies are likely to form alternative structures, while those with low Shannon entropies correspond to regions with well-defined RNA structures, or persistent single-strandedness (Siegfried et al., 2014).
Shannon entropy is calculated as:
where pi is the probability of base i of being base-paired.
Since version 2.5, RF Fold generates vector graphical reports (SVG format) for each structure, reporting the per-base reactivity, the MEA structure, the per-base Shannon entropy, and the base-pairing probabilities:
The calculation of Shannon entropy and base-pairing probabilities requires partition function to be computed. Since this is a very slow step, partition function folding is performed only in windowed mode, or if parameters
--shannon) are explicitly specified.
To list the required parameters, simply type:
$ rf-fold -h
|-o or --output-dir||string||Output directory for writing inferred structures (Default: rf_fold/)|
|-ow or --overwrite||Overwrites the output directory if already exists|
|-ct or --connectivity-table||Writes predicted structures in CT format (Default: Dot-bracket notation)|
|-m or --folding-method||int||Folding method (1-2, Default: 1):
|-p or --processors||int||Number of processors (threads) to use (Default: 1)|
|-g or --img||Enables generation of graphical reports|
|-t or --temperature||float||Temperature in Celsius degrees (Default: 37.0)|
|-sl or --slope||float||Sets the slope used with structure probing data restraints (Default: 1.8 [kcal/mol])|
|-in or --intercept||float||Sets the intercept used with structure probing data restraints (Default: -0.6 [kcal/mol])|
|-md or --maximum-distance||int||Maximum pairing distance (in nt) between transcript's residues (Default: 0 [no limit])|
|-nlp or --no-lonelypairs||Disallows lonely base-pairs (1 bp helices) inside predicted structures|
|-i or --ignore-reactivity||Ignores XML reactivity data when performing folding (MFE unconstrained prediction)|
|-hc or --hard-constraint||Besides performing soft-constraint folding, allows specifying a reactivity cutoff (specified by
|-f or --cutoff||float||Reactivity cutoff for constraining a position as unpaired (>0, Default: 0.7)|
|-w or --windowed||Enables windowed folding|
|-pt or --partition||string||Path to RNAstructure
Note: by default,
|-pp or --probabilityplot||string||Path to RNAstructure
|-fw or --fold-window||int||Window size (in nt) for performing MFE folding (>=50, Default: 600)|
|-fo or --fold-offset||int||Offset (in nt) for MFE folding window sliding (Default: 200)|
|-pw or --partition-window||int||Window size (in nt) for performing partition function (>=50, Default: 600)|
|-po or --partition-offset||int||Offset (in nt) for partition function window sliding (Default: 200)|
|-wt or --window-trim||int||Number of bases to trim from both ends of the partition windows to avoid end biases (Default: 100)|
|-dp or --dotplot||Enables generation of dot-plots of base-pairing probabilities|
|-sh or --shannon-entropy||Enables generation of a WIGGLE track file with per-base Shannon entropies|
|-pk or --pseudoknots||Enables detection of pseudoknots (computationally intensive)|
|-kp1 or --pseudoknot-penality1||float||Pseudoknot penality P1 (Default: 0.35)|
|-kp2 or --pseudoknot-penality2||float||Pseudoknot penality P2 (Default: 0.65)|
|-kt or --pseudoknot-tollerance||float||Maximum tollerated deviation of suboptimal structures energy from MFE (>0-1, Default: 0.25 [25%])|
|-kh or --pseudoknot-helices||int||Number of candidate pseudoknotted helices to evaluate (>0, Default: 100)|
|-kw or --pseudoknot-window||int||Window size (in nt) for performing pseudoknots detection (>=50, Default: 600)|
|-ko or --pseudoknot-offset||int||Offset (in nt) for pseudoknots detection window sliding (Default: 200)|
|-kc or --pseudoknot-cutoff||float||Reactivity cutoff for retaining a pseudoknotted helix (0-1, Default: 0.5)|
|-km or --pseudoknot-method||int||Algorithm for pseudoknots prediction (1-2, Default: 1):
1. RNA Framework
Note: the chosen folding method (specified by
|RNA Framework pseudoknots detection algorithm options|
|-vrs or --vienna-rnasubopt||string||Path to ViennaRNA
|-ks or --pseudoknot-suboptimal||int||Number of suboptimal structures to evaluate for pseudoknots prediction (>0, Default: 1000)|
|ShapeKnots pseudoknots detection algorithm options|
|-sk or --shapeknots||string||Path to
Note: by default,
|Folding method #1 options (ViennaRNA)|
|-vrf or --vienna-rnafold||string||Path to ViennaRNA
|-ngu or --no-closing-gu||Disallows G:U wobbles at the end of helices|
|-cm or --constraint-method||int||Method for converting provided reactivities into pseudo-energies (1-2, Default: 1):
1. Deigan et al., 2009
2. Zarringhalam et al., 2012
|Zarringhalam et al., 2012 method options|
|-cc or --constraint-conversion||int||Method for converting
1. Skip normalization step (reactivities are treated as pairing probabilities)
2. Linear mapping according to Zarringhalam et al., 2012
3. Use a cutoff to divide nucleotides into paired, and unpaired
4. Linear model for converting reactivities into probabilities of being unpaired
5. Linear model for converting the logarithm of reactivities into probabilities of being unpaired
|-bf or --beta-factor||float||Sets the magnitude of penalities for deviations from the observed pairing probabilities (Default: 0.5)|
|-ms or --model-slope||float||Sets the slope used by the linear model (Default: 0.68 [Method #4], or 1.6 [Method #5]; requires
|-mi or --model-intercept||float||Sets the intercept used by the linear model (Default: 0.2 [Method #4], or -2.29 [Method #5]; requires
|Folding method #2 options (RNAstructure)|
|-rs or --rnastructure||string||Path to RNAstructure
Note: by default,
|-d or --data-path||string||Path to RNAstructure data tables (Default: assumes DATAPATH environment variable is already set)|
For additional details relatively to ShapeKnots pseudoknots detection parameters, please refer to Hajdin et al., 2013 (PMID: 23503844).
Output dot-plot files
-dp is provided, RF Fold produces a dot-plot file for each transcript being analyzed, with the following structure:
1549 # RNA's length i j -log10(Probability) # Header 8 254 0.459355416499312 9 253 0.446335563943221 10 252 0.456738523239413 11 251 0.454733421725068 12 250 0.46965667808714 13 249 0.47837140333524 21 35 0.268192200569539 22 34 0.0183400615262171 23 33 0.0166665677814708 24 32 0.0128927546134575 25 31 0.0148601207296645 26 30 0.0252017532628297 -- cut -- 1497 1510 0.0147874890078331 1498 1509 0.0102803152157546 1499 1508 0.0137510190884233 1500 1507 0.0402352346970943
where i and j are the positions (1-based) of the bases involved in a given base-pair, followed by the -log10 of their base-pairing probability.
These files can be easily viewed using the Integrative Genomics Viewer (IGV) (for additional details, please refer to the official Broad Institute's IGV page).