The RF Index tool is designed to automatically generate a Bowtie reference index, that will be used by the RF Count module for reads mapping.
This tool requires an internet connection, since it relies on querying the UCSC Genome database to obtain transcripts annotation and reference genome’s sequence. Alternatively, RF Index can be used to retrieve prebuilt indexes from RNAFramework.com.

Usage

To list the required parameters, simply type:

$ rf-index -h
Parameter Type Description
-b2 or --bowtie2 Generates/retrieves a Bowtie v2 index (Default: Bowtie v1)
-o or --output-dir string Bowtie index output directory (Default: automatically defined in index retrieval mode, <assembly>_<annotation>_<bowtie version> in index building mode)
-ow or --overwrite Overwrites the output directory if already exists
Prebuilt indexes retrieval mode
-l or --list Lists available RNA Framework prebuilt reference indexes
-pb or --prebuilt int Retrieves the prebuilt reference index with the given ID (>=1, Default: none)
Note: to obtain a list of available prebuild indexes, use -l (or --list)
Reference building mode
-g or --genome-assembly string Genome assembly for the species of interest (Default: mm9).
For a complete list of UCSC available assemblies, please refer to the UCSC website (https://genome.ucsc.edu/FAQ/FAQreleases.html)
-a or --annotation string Name of the UCSC table containing the genes annotation (Default: refFlat).
For a complete list of tables available for the chosen assembly, please refer to the UCSC website (https://genome.ucsc.edu/cgi-bin/hgTables)
-n or --alt-name Uses alternative gene names (see UCSC tables' "name2" column)
-co or --coding-only Builds reference index using only protein-coding transcripts
-no or --noncoding-only Builds reference index using only non-coding transcripts
-t or --timeout int Connection’s timeout in seconds (Default: 180)
-r or --reference string Path to a FASTA file containing chromosome (or scaffold) sequences for the chosen genome assembly.
Note: if no file is specified, RF Index will try to obtain sequences from the UCSC DAS server. This process may take up to hours, depending on your connection's speed.
-b or --bowtie-build string Path to bowtie-build (or bowtie2-build) executable (Default: assumes bowtie-build/bowtie2-build is in PATH)
-e or --bedtools string Path to bedtools executable (Default: assumes bedtools is in PATH)

Note

For experiments conducted over synthetic RNAs (or custom RNA pools), a reference can be generated by directly invoking the bowtie-build command on a FASTA file lexicographically sorted by sequence ID.

To prepare a custom Bowtie index, simply do:

# Sort reference FASTA file by sequence ID
$ awk 'BEGIN{RS=">"} NR>1 {gsub("\n", "\t"); print ">"$0}' reference_unsorted.fa | \ 
  LC_ALL=C sort -t ' ' -k 2,2 | \ 
  awk '{sub("\t", "\n"); gsub("\t", ""); print $0}' > reference_sorted.fa

# Build a Bowtie index
$ bowtie-build reference_sorted.fa reference_sorted

# Alternatively, build a Bowtie v2 index
$ bowtie2-build reference_sorted.fa reference_sorted

$ ls -l

  -rwxrwxrwx 1 danny epigenetics  96041105 5 mar 10.50 reference_sorted.1.ebwt
  -rwxrwxrwx 1 danny epigenetics  37313744 5 mar 10.50 reference_sorted.2.ebwt
  -rwxrwxrwx 1 danny epigenetics   1844468 5 mar 10.28 reference_sorted.3.ebwt
  -rwxrwxrwx 1 danny epigenetics  74627475 5 mar 10.28 reference_sorted.4.ebwt
  -rwxrwxrwx 1 danny epigenetics 302198817 5 mar 10.28 reference_sorted.fa
  -rwxrwxrwx 1 danny epigenetics  96041105 5 mar 11.11 reference_sorted.rev.1.ebwt
  -rwxrwxrwx 1 danny epigenetics  37313744 5 mar 11.11 reference_sorted.rev.2.ebwt
  -rwxrwxrwx 1 danny epigenetics 302198817 5 mar 10.28 reference_unsorted.fa