The RF Fold module is designed to allow transcriptomewide reconstruction of RNA structures, starting from XML files generated using the RF Norm tool.
This tool can process a single, or an entire directory of XML files, and produces the inferred secondary structures (either in dotbracket notation, or CT format) and their graphical representation (either in Postscript, or SVG format).
Folding inference can be performed using 2 different algorithms:
1. ViennaRNA
2. RNAstructure
Prediction can be performed either on the whole transcript, or through a windowed approach (see next paragraph).
Windowed folding
The windowed folding approach is based on the original method described in Siegfried et al., 2014 (PMID: 25028896), and consists of 3 main steps, outlined below:
In step I (optional), a window is slid along the RNA, and pseudoknotted structures are detected using the same approach employed by the ShapeKnots algorithm (Hajdin et al., 2013 (PMID: 23503844)). Our implementation of the ShapeKnots algorithm relies on the ViennaRNA package (instead of RNAstructure as the original implementation did), thus is much faster:
Nonetheless, both algorithms work in single thread. Alternatively, the multithread implementation ShapeKnotssmp
shipped with the latest RNAstructure version can be used.
If constraints from structure probing experiments are provided, these are incorporated in the form of softconstraints. Predicted pseudoknotted basepairs are retained if they apper in >50% of analyzed windows. In case constraints are provided, pseudoknots are retained only if the average reactivity of bases on both sides of the helices is below a certain reactivity cutoff.
In step II, a window is slid along the RNA, and partition function is calculated. If provided, softconstraints are applied. If step I has been performed, pseudoknotted bases are hardconstrained to be singlestranded. Predicted basepair probabilities are averaged across all windows in which they have appeared, and basepairs with >99% probability are retained, and hardconstrained to be paired in step III.
In step III, a window is slid along the RNA, and MFE folding is performed, including (where present) softconstraints from probing data, and hardconstraints from stages I and II. Predicted basepairs are retained if they appear in >50% of analyzed windows.
Note
At all stages, increased sampling is performed at the 5'/3'ends to avoid end biases
At this stage, if step I has been peformed, pseudoknotted basepairs are added back to the structure, and the free energy is computed. Along with the predicted structure, the windowed method also produces a WIGGLE track file containing perbase Shannon entropies.
Regions with higher Shannon entropies are likely to form alternative structures, while those with low Shannon entropies correspond to regions with welldefined RNA structures, or persistent singlestrandedness (Siegfried et al., 2014).
Shannon entropy is calculated as:
where p_{i} is the probability of base i of being basepaired.
Since version 2.5, RF Fold generates vector graphical reports (SVG format) for each structure, reporting the perbase reactivity, the MEA structure, the perbase Shannon entropy, and the basepairing probabilities:
Note
The calculation of Shannon entropy and basepairing probabilities requires partition function to be computed. Since this is a very slow step, partition function folding is performed only in windowed mode, or if parameters dp
(or dotplot
) or sh
(or shannon
) are explicitly specified.
Usage
To list the required parameters, simply type:
$ rffold h
Parameter  Type  Description 

o or outputdir  string  Output directory for writing inferred structures (Default: rf_fold/) 
ow or overwrite  Overwrites the output directory if already exists  
ct or connectivitytable  Writes predicted structures in CT format (Default: Dotbracket notation)  
m or foldingmethod  int  Folding method (12, Default: 1): 1. ViennaRNA 2. RNAstructure 
p or processors  int  Number of processors (threads) to use (Default: 1) 
g or img  Enables generation of graphical reports  
t or temperature  float  Temperature in Celsius degrees (Default: 37.0) 
sl or slope  float  Sets the slope used with structure probing data restraints (Default: 1.8 [kcal/mol]) 
in or intercept  float  Sets the intercept used with structure probing data restraints (Default: 0.6 [kcal/mol]) 
md or maximumdistance  int  Maximum pairing distance (in nt) between transcript's residues (Default: 0 [no limit]) 
nlp or nolonelypairs  Disallows lonely basepairs (1 bp helices) inside predicted structures  
i or ignorereactivity  Ignores XML reactivity data when performing folding (MFE unconstrained prediction)  
hc or hardconstraint  Besides performing softconstraint folding, allows specifying a reactivity cutoff (specified by f ) for hardconstraining a base to be singlestranded 

f or cutoff  float  Reactivity cutoff for constraining a position as unpaired (>0, Default: 0.7) 
w or windowed  Enables windowed folding  
pt or partition  string  Path to RNAstructure partition executable (Default: assumes partition is in PATH)Note: by default, partitionsmp will be used (if available) 
pp or probabilityplot  string  Path to RNAstructure ProbabilityPlot executable (Default: assumes ProbabilityPlot is in PATH) 
fw or foldwindow  int  Window size (in nt) for performing MFE folding (>=50, Default: 600) 
fo or foldoffset  int  Offset (in nt) for MFE folding window sliding (Default: 200) 
pw or partitionwindow  int  Window size (in nt) for performing partition function (>=50, Default: 600) 
po or partitionoffset  int  Offset (in nt) for partition function window sliding (Default: 200) 
wt or windowtrim  int  Number of bases to trim from both ends of the partition windows to avoid end biases (Default: 100) 
dp or dotplot  Enables generation of dotplots of basepairing probabilities  
sh or shannonentropy  Enables generation of a WIGGLE track file with perbase Shannon entropies  
pk or pseudoknots  Enables detection of pseudoknots (computationally intensive)  
kp1 or pseudoknotpenality1  float  Pseudoknot penality P1 (Default: 0.35) 
kp2 or pseudoknotpenality2  float  Pseudoknot penality P2 (Default: 0.65) 
kt or pseudoknottollerance  float  Maximum tollerated deviation of suboptimal structures energy from MFE (>01, Default: 0.25 [25%]) 
kh or pseudoknothelices  int  Number of candidate pseudoknotted helices to evaluate (>0, Default: 100) 
kw or pseudoknotwindow  int  Window size (in nt) for performing pseudoknots detection (>=50, Default: 600) 
ko or pseudoknotoffset  int  Offset (in nt) for pseudoknots detection window sliding (Default: 200) 
kc or pseudoknotcutoff  float  Reactivity cutoff for retaining a pseudoknotted helix (01, Default: 0.5) 
km or pseudoknotmethod  int  Algorithm for pseudoknots prediction (12, Default: 1): 1. RNA Framework 2. ShapeKnots Note: the chosen folding method (specified by m ) affects the algorithm used by RNA Framework (pseudoknot detection method #1) to define the initial MFE structure 
RNA Framework pseudoknots detection algorithm options  
vrs or viennarnasubopt  string  Path to ViennaRNA RNAsubopt executable (Default: assumes RNAsubopt is in PATH) 
ks or pseudoknotsuboptimal  int  Number of suboptimal structures to evaluate for pseudoknots prediction (>0, Default: 1000) 
nz or nozuker  Disables the inclusion of Zuker suboptimal structures (reduces the sampled folding space)  
zs or zukersuboptimal  Number of Zuker suboptimal structures to include (>0, Default: 1000)  
ShapeKnots pseudoknots detection algorithm options  
sk or shapeknots  string  Path to ShapeKnots executable (Default: assumes ShapeKnots is in PATH)Note: by default, ShapeKnotssmp will be used (if available) 
Folding method #1 options (ViennaRNA)  
vrf or viennarnafold  string  Path to ViennaRNA RNAfold executable (Default: assumes RNAfold is in PATH) 
ngu or noclosinggu  Disallows G:U wobbles at the end of helices  
cm or constraintmethod  int  Method for converting provided reactivities into pseudoenergies (12, Default: 1): 1. Deigan et al., 2009 2. Zarringhalam et al., 2012 
Zarringhalam et al., 2012 method options  
cc or constraintconversion  int  Method for converting rfnorm reactivities into pairing probabilities (15, Default: 1):1. Skip normalization step (reactivities are treated as pairing probabilities) 2. Linear mapping according to Zarringhalam et al., 2012 3. Use a cutoff to divide nucleotides into paired, and unpaired 4. Linear model for converting reactivities into probabilities of being unpaired 5. Linear model for converting the logarithm of reactivities into probabilities of being unpaired 
bf or betafactor  float  Sets the magnitude of penalities for deviations from the observed pairing probabilities (Default: 0.5) 
ms or modelslope  float  Sets the slope used by the linear model (Default: 0.68 [Method #4], or 1.6 [Method #5]; requires cc 4 or cc 5 ) 
mi or modelintercept  float  Sets the intercept used by the linear model (Default: 0.2 [Method #4], or 2.29 [Method #5]; requires cc 4 or cc 5 ) 
Folding method #2 options (RNAstructure)  
rs or rnastructure  string  Path to RNAstructure Fold executable (Default: assumes Fold is in PATH)Note: by default, Foldsmp will be used (if available) 
d or datapath  string  Path to RNAstructure data tables (Default: assumes DATAPATH environment variable is already set) 
Information
For additional details relatively to ViennaRNA softconstraint prediction methods, please refer to the ViennaRNA documentation, or to Lorenz et al., 2016 (PMID: 26353838).
Information
For additional details relatively to ShapeKnots pseudoknots detection parameters, please refer to Hajdin et al., 2013 (PMID: 23503844).
Output dotplot files
When option dp
is provided, RF Fold produces a dotplot file for each transcript being analyzed, with the following structure:
1549 # RNA's length
i j log10(Probability) # Header
8 254 0.459355416499312
9 253 0.446335563943221
10 252 0.456738523239413
11 251 0.454733421725068
12 250 0.46965667808714
13 249 0.47837140333524
21 35 0.268192200569539
22 34 0.0183400615262171
23 33 0.0166665677814708
24 32 0.0128927546134575
25 31 0.0148601207296645
26 30 0.0252017532628297
 cut 
1497 1510 0.0147874890078331
1498 1509 0.0102803152157546
1499 1508 0.0137510190884233
1500 1507 0.0402352346970943
where i and j are the positions (1based) of the bases involved in a given basepair, followed by the log_{10} of their basepairing probability.
These files can be easily viewed using the Integrative Genomics Viewer (IGV) (for additional details, please refer to the official Broad Institute's IGV page).