The RF PeakCall module takes two RC files generated by the RF Count module, and performs transcriptome-wide peak calling from RNA immunoprecipitation (IP) experiments.
Analysis is performed by sliding a window of length w along the transcript, and calculating the signal enrichment in the IP sample versus control sample as:

${E}_{\left(i..i+w\right)}={\mathrm{log}}_{2}\left(\frac{\left({\mu }_{IP\left(i..i+w\right)}+p\right)}{\left(M{d}_{IP}+p\right)}}{\left({\mu }_{Ctrl\left(i..i+w\right)}+p\right)}{\left(M{d}_{Ctrl}+p\right)}}\right)$

where i and i+w are the start and end position of the window, μIP(i..i+w) and μCtrl(i..i+w) are respectively the mean coverage within the analyzed window in the IP and control samples, MdIP and MdCtrl are respectively the median coverage on the whole transcript in the IP and control samples, and p is a pseudocount added to deal with non-covered regions/transcripts.
When a control sample is not provided, the signal enrichment is simply calculated as:

${E}_{\left(i..i+w\right)}={\mathrm{log}}_{2}\left(\left({\mu }_{IP\left(i..i+w\right)}+p\right)}{\left(M{d}_{IP}+p\right)}\right)$

A p-value is then calculated for each window with detected enrichment above a defined cutoff, using a Fisher's exact test. Thus, the following 2x2 contingency matrix is defined for each cutoff-passing window:

n11 n12
n21 μIP(i..i+w) MdIP
n22 μCtrl(i..i+w) MdCtrl

If no control sample is provided, the contingency matrix is instead defined as:

n11 n12
n21 μIP(i..i+w) MdIP
n22 μIP windows MdIP

where μIP windows is the average the mean values for each possible window in the IP sample.
P-values are then subjected to Benjamini-Hochberg correction. Consecutive significantly enriched windows are then merged together, and p-values are combined using Stouffer's method.

# Usage

To list the required parameters, simply type:

``````\$ rf-peakcall -h
``````
Parameter Type Description
-c or --control string Path to the RC file for the control sample
-I or --IP string Path to the RC file for the immunoprecipitated (IP) sample
-i or --index string[,string] A comma separated (no spaces) list of RCI index files for the provided RC files
Note #1: RCI files must be provided in the order 1. Control, 2. IP
Note #2: If a single RTI file is specified, it will be used for all RC files
Note #3: If no RCI index is provided, it will be generated at runtime, and stored in the same folder of the control/IP samples
-p or --processors int Number of processors (threads) to use (Default: 1)
-o or --output string Output peaks BED-like file (Default: <IP>_vs_<Control>.bed/)
-ow or --overwrite Overwrites the output file if already exists
-w or --window int Window size (in nt) for peak calling (≥10, Default: 150)
-f or --offset int Offset (in nt) for window sliding (≥1, Default: window / 2)
-md or --merge-distance int Maximum distance (in nt) for merging non-overlapping windows (≥0, Default: 50)
-e or --enrichment float Minimum log2 enrichment in IP vs. Control for reporting a peak (≥1, Default: 3)
-v or --p-value float P-value cutoff for reporting a peak (0 ≤ p ≤ 1, Default: 0.05)
-pc or --pseudocount float Pseudocount added to read counts to avoid division by 0 (>0, Default: 1)
-mc or --mean-coverage float Discards any transcript with mean coverage in control sample below this threshold (≥0, Default: 0)
-ec or --median-coverage float Discards any transcript with median coverage in control sample below this threshold (≥0, Default: 0)
-D or --decimals int Number of decimals for reporting enrichment/p-value (1-10, Default: 3)

## Output BED-like files

RF PeakCall produces a BED-like file with 5 tab-delimited fields:

Field Description
Transcript ID ID of the transcript
Start Start coordinate of the peak (0-based)
End End coordinate of the peak (0-based)
Enrichment Fold enrichment of the IP signal versus Control signal