The RF Count module is the core component of the framework. It can process any number of SAM/BAM files to calculate per-base RT-stops/mutations and read coverage on each transcript.


To list the required parameters, simply type:

$ rf-count -h
Parameter Type Description
-p or --processors int Number of processors (threads) to use (Default: 1)
-wt or --working-threads int Number of working threads to use for each instance of SAMTools/Bowtie (Default: 1).
Note: RT Counter executes 1 instance of SAMTools for each processor specified by -p. At least -p <processors> * -wt <threads> processors are required.
-o or --output-dir string Output directory for writing counts in RC (RNA Count) format (Default: rf_count/)
-ow or --overwrite Overwrites the output directory if already exists
-t or --tmp-dir string Path to a directory for temporary files creation (Default: /tmp)
Note: If the provided directory does not exist, it will be created
-nm or --no-mapped-count Disables counting of total mapped reads
Note: This option must be avoided when processing SAM/BAM files from Ψ-seq/Pseudo-seq and 2OMe-seq experiments.
-s or --samtools string Path to samtools executable (Default: assumes samtools is in PATH)
-r or --sorted In case SAM/BAM files are passed, assumes that they are already sorted lexicographically by transcript ID, and numerically by position
-t5 or --trim-5prime int[,int] Comma separated list (no spaces) of values indicating the number of bases trimmed from the 5'-end of reads in the respective sample SAM/BAM files (Default: 0)
Note #1: Values must be provided in the same order as the input files (e.g. rf-count -t5 0,5 file1.bam file2.bam, will consider 0 bases trimmed from file1 reads, and 5 bases trimmed from file2 reads)
Note #2: If a single value is specified along with multiple SAM/BAM files, it will be used for all files
-fh or --from-header Instead of providing the number of bases trimmed from 5'-end of reads through the -t5 (or --trim-5prime) parameter, RF Count will try to guess it automatically from the header of the provided SAM/BAM files
-f or --fasta string Path to a FASTA file containing the reference transcripts
Note #1: Transcripts in this file must match transcripts in SAM/BAM file headers
Note #2: This can be omitted if a Bowtie index is specified by -bi (or --bowtie-index)
-po or --paired-only When processing SAM/BAM files from paired-end experiments, only those reads for which both mates are mapped will be considered
-pp or --properly-paired When processing SAM/BAM files from paired-end experiments, only those reads mapped in a proper pair will be considered
-i or --include-clipped Include reads that have been soft/hard-clipped at their 5'-end when calculating RT-stops
Note: The default behavior is to exclude soft/hard-clipped reads. When this option is active, the RT-stop position is considered to be the position preceding the clipped bases. This option has no effect when -m (or --count-mutations) is enabled.
-m or --count-mutations Enables mutations count instead of RT-stops count (for SHAPE-MaP/DMS-MaPseq)
-mq or --min-quality int Minimum quality score value to consider a mutation (Phred+33, Default: 20)
-nd or --no-deletions Disables counting unambiguously mapped deletions as mutations (requires -m)
-md or --max-deletion-len int Ignores deletions longer than this number of nucleotides (requires -m, Default: 3)
-me or --min-edit-distance int Discards reads with less than this number of mutations/deletions (Default: 0)
-co or --coverage-only Only calculates per-base coverage (disables RT-stops/mutations count)

Deletions re-alignment in mutational profiling-based methods

Mutational profiling (MaP) methods for RNA structure analysis are based on the ability of certain reverse transcriptase enzymes to read-through the sites of SHAPE/DMS modification under specific reaction conditions. Some of them (e.g. SuperScript II) can introduce deletions when encountering a SHAPE/DMS-modified residue. When performing reads mapping, the aligner often reports a single possible alignment of the deletion, although many equally-scoring alignments are possible.
To avoid counting of ambiguously aligned deletions, that can introduce noise in the measured structural signal, RF Count performs a deletion re-alignment step to detect and discard these ambiguously aligned deletions:


ATTACGCGGATCTACGA|AGCTTTACGGACGGTAC     # Sequence surrounding deletion

# Slide the deletion along sequence     # Extract surrounding sequence

# Compare surrounding sequence from sled deletion to that from the original alignment

# Concatenate surrounding sequences

# Deletion is discarded because it is NOT unambiguously aligned

For more information, please refer to Smola et al., 2015 (PMID: 26426499).

RC (RNA Count) format

RF Count produces a RC (RNA Count) file for each analyzed sample. RC files are proprietary binary files, that store transcript’s sequence, per-base RT-stop/mutation counts, per-base read coverage, and total number of mapped reads. These files can be indexed for fast random access.
Each entry in a RC file is structured as follows:

Field Description Type
len_transcript_id Length of the transcript ID (plus 1, including NULL) uint32_t
transcript_id Transcript ID (NULL terminated) char[len_transcript_id]
len_seq Length of sequence uint32_t
seq 4-bit encoded sequence: 'ACGTN' -> [0,4] (High nybble first) uint8_t[(len_seq+1)/2]
counts Transcript's per base RT-stops (or mutations) uint32_t[len_seq]
coverage Transcript's per base coverage uint32_t[len_seq]
nt Transcript's mapped reads unint64_t

RC files EOF stores the number of total mapped reads, and is structured as follows:

Field Description Type
n Total experiment mapped reads uint64_t
version RC file version uint16_t
marker EOF marker (\x5b\x65\x6f\x66\x72\x63\x5d) char[7]

The current RC standard's version is 1.
RCI (RC Index) files enable random access to transcript data within RC files.
The RCI index is structured as follows:

Field Description Type
len_transcript_id Length of the transcript ID (plus 1, including NULL) uint32_t
transcript_id Transcript ID (NULL terminated) char[len_transcript_id]
offset Offset of transcript in the RC file uint64_t


All values are forced to be in little-endian byte-order.