The RF Norm module takes one (Rouskin and Zubradt methods), two (Ding method), or three (Siegfried method) RC files generated by the RF Count module, and performs normalization to obtain transcriptome-wide per-base reactivities.
Reactivity scores can be computed using 4 methods:

Scoring of RT-stops/nuclease cuts-based methods

[1] Ding et al., 2014 (PMID: 24270811)

Per-base signal is calculated as the natural log (ln) of the ratio between the raw count of RT-stops/nuclease cuts at a given position of a transcript, and the average of the ln of RT-stops/nuclease cuts along the whole transcript:

Ui=ln(n1i+p)j=0lln(n1j+p)l


Ti=ln(n2i+p)j=0lln(n2j+p)l
where n1i and n2i are respectively the raw read counts in the untreated (or RNase V1) and treated (DMS, CMCT, SHAPE, or Nuclease S1) samples at position i of the transcript, l is the transcript’s length, and p is a pseudocount added to deal with non-covered regions. Ui and Ti are respectively the normalized number of RT-stops/nuclease cuts at position i in the untreated and treated samples.
Score at position i is then calculated as:

Si=max(0, (Ti-Ui))
[2] Rouskin et al., 2014 (PMID: 24336214)

The untreated sample is not considered. Per-base RT-stops/nuclease cuts are used as a direct measure of the raw signal.

Warning

Normalization of data processed by Rouskin method, can only be performed using the 90% Winsorising approach.


Scoring of mutational profiling-based methods

[3] Siegfried et al., 2014 (PMID: 25028896)

This method takes into account both an untreated sample, and a denatured control sample.
Per-base raw signal is calculated as:

Si=nTicTi-nUicUinDicDi


where nTi, nUi, and nDi are respectively the mutation counts in the treated, untreated, and denatured samples at position i of the transcript, while cTi, cUi, and cDi are respectively the reads covering position i of the transcript in the treated, untreated, and denatured samples.

[4] Zubradt et al., 2016 (PMID: 27819661)

The untreated sample is not considered. Per-base raw signal is calculated as:

Si=nTicTi
where nTi, and cTi are respectively the mutations count and the read coverage at position i of the transcript.

Normalization of raw reactivities

Raw reactivity scores can be normalized using 3 different approaches:

Method Description
2-8% Normalization From the top 10% of values, the top 2% is ignored, then any reactivity value along the entire transcript is divided by the average of the remaining 8%
90% Winsorising Each reactivity value above the 95th percentile is set to the 95th percentile, and the reactivity at each position of the transcript is divided by the value of the 95th percentile
Box-plot Normalization Values greater than 1.5x the interquartile range (numerical distance between the 25th and 75th percentiles) above the 75th percentile are removed. After excluding these outliers, the next 10% of reactivities are averaged, and all reactivities (including outliers) are divided by this value.


Normalized reactivities can be further remapped to values ranging from 0 to 1 according to Zarringhalam et al., 2012 (PMID: 23091593). In this approach, values < 0.25 are linearly mapped to [0-0.35[, values ≥ 0.25 and < 0.3 are linearly mapped to [0.35-0.55[, values ≥ 0.3 and < 0.7 are linearly mapped to [0.55-0.85[, and values ≥ 0.7 are linearly mapped to [0.85-1].

Usage

To list the required parameters, simply type:

$ rf-norm -h
Parameter Type Description
-u or --untreated string Path to the RC file for the non-treated (or RNase V1) sample
(required by Ding/Siegfried scoring methods)
-t or --treated string Path to the RC file for the treated (or Nuclease S1) sample
-d or --denatured string Path to the RC file for the denatured sample
(required by Siegfried scoring method)
-i or --index string[,string] A comma separated (no spaces) list of RCI index files for the provided RC files
Note #1: RCI files must be provided in the order 1. Untreated, 2. Denatured, 3. Treated
Note #2: If a single RTI file is specified, it will be used for all RC files
Note #3: If no RCI index is provided, it will be generated at runtime, and stored in the same folder of the untreated/denatured/treated samples
-p or --processors int Number of processors (threads) to use (Default: 1)
-o or --output-dir string Output directory for writing normalized data in XML format (Default: <treated>_vs_<untreated>_norm/ for Ding method, <treated>_norm/ for Rouskin/Zubradt methods, <treated>_vs_<untreated>_<denatured>_norm/ for Siegfried method)
-ow or --overwrite Overwrites the output directory if already exists
-c or --config-file string Path to a configuration file with normalization parameters (see Configuration files paragraph)
Note #1: If the provided file exists, the loaded configuration will override any command-line specified parameter
Note #2: If the provided file doesn’t exist, it will be generated using the specified command-line (or default) parameters
-sm or --scoring-method int Method for score calculation (1-4, Default: 1):
1. Ding et al., 2014
2. Rouskin et al., 2014
3. Siegfried et al., 2014
4. Zubradt et al., 2016
-nm or --norm-method int Method for signal normalization (1-3, Default: 1):
1. 2-8% Normalization
2. 90% Winsorising
3. Box-plot Normalization
-r or --raw Reports raw reactivities (skips data normalization)
-rm or --remap-reactivities Remaps normalized reactivities to values ranging from 0 to 1 according to Zarringhalam et al., 2012
-rb or --reactive-bases string Reactive bases to consider for signal normalization (Default: N [ACGT])
Note: This parameter accepts any IUPAC code, or their combination (e.g. -rb M, or -rb AC). Any other base will be reported as NaN
-ni or --norm-independent Each one of the reactive bases will be normalized independently (e.g. -rb AC -ni will independently normalize A and C residues)
-mc or --mean-coverage float Discards any transcript with mean coverage below this threshold (≥0, Default: 0)
-ec or --median-coverage float Discards any transcript with median coverage below this threshold (≥0, Default: 0)
-nw or --norm-window int Window size (in nt) for signal normalization (≥3, Default: whole transcript [Ding; Siegfried], 50 [Rouskin; Zubradt])
-wo or --window-offset int Offset (in nt) for window sliding during normalization (Default: none [Ding; Siegfried], 50 [Rouskin; Zubradt])
-D or --decimals int Number of decimals for reporting reactivities (1-10, Default: 3)
-n or --nan int Positions of transcript with read coverage behind this threshold, will be reported as NaN in the reactivity profile (>0, Default: 10)
Scoring method #1 options (Ding et al., 2014)
-pc or --pseudocount float Pseudocount added to reactivities to avoid division by 0 (>0, Default: 1)
-s or --max-score float Score threshold for capping raw reactivities (>0, Default: 10)
Scoring method #3 options (Siegfried et al., 2014)
-mu or --max-untreated-mut float Maximum per-base mutation rate in untreated sample (≤1, Default: 0.05 [5%])


Configuration files

RF Norm configuration files are used to provide normalization parameters for the analysis, without the need to manually specify them from the command-line.
Configuration files are composed of a list of key/value pairs, separated by the equal sign (=), or by the colon punctuation mark (:). Keys and values are case-insensitive.
Accepted key/value pairs are:

Parameters Accepted values Default value
scoreMethod "Ding" (or 1); "Rouskin" (or 2); "Siegfried" (or 3); "Zubradt" (or 4) Ding
normMethod "2-8%" (or 1); "90% Winsorising" (or 2); "Box-plot" (or 3) 2-8%
reactiveBases [ACGTURYSWKMBDHVN] (or "all") all
normIndependent TRUE/FALSE; Yes/No; 1/0 FALSE
normWindow Positive integer ≥ 3 1e9 [Ding; Siegfried]
50 [Rouskin; Zubradt]
windowOffset Positive integer > 0 1e9 [Ding; Siegfried]
50 [Rouskin; Zubradt]
meanCoverage Positive float ≥ 0 0
medianCoverage Positive float ≥ 0 0
remapReactivities TRUE/FALSE; Yes/No; 1/0 FALSE
Scoring method #1 options
maxScore Positive float > 0 10
pseudoCount Positive float > 0 1
Scoring method #3 options
maxUntreatedMut Positive float ≤ 1 0.05
# A sample configuration file

scoreMethod=Ding
normMethod=2-8%
maxScore=10
pseudoCount=1
reactiveBases=N
normIndependent=FALSE
normWindow=1e9
windowOffset=1e9
meanCoverage=1


Output XML files

RF Norm produces a XML file for each transcript being analyzed, with the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<data [attributes]>
    <transcript id=”Transcript ID” length=”Transcript length”>
        <sequence>
            Transcript sequence
        </sequence>
        <reactivity>
            Comma-separated list of reactivity values
        </reactivity>
    </transcript>
</data>

The data tag’s attributes allow keeping track of the analysis performed:

Attribute Possible values Description
tool rf-norm The tool that generated this XML file
scoring Ding, Rouskin, Siegfried, or Zubradt Scoring method
norm 2-8%, Winsorising 90%, or Box-plot Normalization method
reactive [ACGT] Reactive bases
win Positive integer ≥ 3 Normalization window's size (in nt)
offset Positive integer ≥ 1 Offset for normalization window sliding
remap TRUE/FALSE Whether normalized reactivities have been remapped according to Zarringhalam et al., 2012
Scoring method #1 (Ding et al., 2014)
max Positive float > 0 Score threshold for capping raw reactivities
pseudo Positive float > 0 Pseudocount added to avoid division by 0 during reactivity calculation
Scoring method #3 (Siegfried et al., 2014)
maxumut Positive float ≤ 1 Maximum per-base mutation rate in untreated sample