The RF Index tool is designed to automatically generate a Bowtie reference index, that will be used by the RF Map module for reads mapping.
This tool requires an internet connection, since it relies on querying the UCSC Genome database to obtain transcripts annotation and reference genome’s sequence. Alternatively, RF Index can be used to retrieve prebuilt indexes from RNAFramework.com.
Usage
To list the required parameters, simply type:
$ rf-index -h
Parameter | Type | Description |
---|---|---|
-b2 or --bowtie2 | Generates/retrieves a Bowtie v2 index (Default: Bowtie v1) | |
-o or --output-dir | string | Bowtie index output directory (Default: automatically defined in index retrieval mode, <assembly>_<annotation>_<bowtie version> in index building mode) |
-ow or --overwrite | Overwrites the output directory if already exists | |
Prebuilt indexes retrieval mode | ||
-l or --list | Lists available RNA Framework prebuilt reference indexes | |
-pb or --prebuilt | int | Retrieves the prebuilt reference index with the given ID (>=1, Default: none) Note: to obtain a list of available prebuild indexes, use -l (or --list ) |
Reference building mode | ||
-g or --genome-assembly | string | Genome assembly for the species of interest (Default: mm9). For a complete list of UCSC available assemblies, please refer to the UCSC website (https://genome.ucsc.edu/FAQ/FAQreleases.html) |
-a or --annotation | string | Name of the UCSC table containing the genes annotation (Default: refFlat). For a complete list of tables available for the chosen assembly, please refer to the UCSC website (https://genome.ucsc.edu/cgi-bin/hgTables) |
-n or --alt-name | Uses alternative gene names (see UCSC tables' "name2" column) | |
-co or --coding-only | Builds reference index using only protein-coding transcripts | |
-no or --noncoding-only | Builds reference index using only non-coding transcripts | |
-t or --timeout | int | Connection’s timeout in seconds (Default: 180) |
-r or --reference | string | Path to a FASTA file containing chromosome (or scaffold) sequences for the chosen genome assembly. Note: if no file is specified, RF Index will try to obtain sequences from the UCSC DAS server. This process may take up to hours, depending on your connection's speed. |
-b or --bowtie-build | string | Path to bowtie-build (or bowtie2-build ) executable (Default: assumes bowtie-build /bowtie2-build is in PATH) |
-e or --bedtools | string | Path to bedtools executable (Default: assumes bedtools is in PATH) |
Note
For experiments conducted over synthetic RNAs (or custom RNA pools), a reference can be generated by directly invoking the bowtie-build
command on a FASTA file lexicographically sorted by sequence ID.
To prepare a custom Bowtie index, simply do:
# Sort reference FASTA file by sequence ID
$ awk 'BEGIN{RS=">"} NR>1 {gsub("\n", "\t"); print ">"$0}' reference_unsorted.fa | \
LC_ALL=C sort -t ' ' -k 2,2 | \
awk '{sub("\t", "\n"); gsub("\t", ""); print $0}' > reference_sorted.fa
# Build a Bowtie index
$ bowtie-build reference_sorted.fa reference_sorted
# Alternatively, build a Bowtie v2 index
$ bowtie2-build reference_sorted.fa reference_sorted
$ ls -l
-rwxrwxrwx 1 danny epigenetics 96041105 5 mar 10.50 reference_sorted.1.ebwt
-rwxrwxrwx 1 danny epigenetics 37313744 5 mar 10.50 reference_sorted.2.ebwt
-rwxrwxrwx 1 danny epigenetics 1844468 5 mar 10.28 reference_sorted.3.ebwt
-rwxrwxrwx 1 danny epigenetics 74627475 5 mar 10.28 reference_sorted.4.ebwt
-rwxrwxrwx 1 danny epigenetics 302198817 5 mar 10.28 reference_sorted.fa
-rwxrwxrwx 1 danny epigenetics 96041105 5 mar 11.11 reference_sorted.rev.1.ebwt
-rwxrwxrwx 1 danny epigenetics 37313744 5 mar 11.11 reference_sorted.rev.2.ebwt
-rwxrwxrwx 1 danny epigenetics 302198817 5 mar 10.28 reference_unsorted.fa