What exactly are the "built-in references" in Galaxy's HISAT2?

The reference genome included with HISAT2 is just that – the genome index only. Reference annotation can also be included with HISAT2 (for splice site identification, filtering). See the tool’s advanced options if you want to incorporate annotation during mapping. Or, you can incorporate it with downstream tools (including FeatureCounts).

For reference annotation, you’ll need to provide a gtf dataset from the history that is based on the same genome/build as used for mapping. UCSC’s version of rn6 is what is indexed at most public Galaxy servers (and what @marten shared links to).

This prior Q&A was about human, but the same instructions for getting the rat data from iGenomes will apply in your case, too. Pick the “UCSC rn6” data.

If you want to use another source and compare the chromosome identifiers, it is easy to generate a peek at the contents of a bam header into a summary – try the tool Samtools: IdxStats reports stats of the BAM index file.

Note: Avoid the gtf generated by the UCSC table browser. The “gene_id” and “transcript_id” fields in the 9th attribute field are both populated with the “transcript_id”, effectively resulting in all counts/summaries produced using it to be “by transcript” (not summarized at the gene level).

FAQ: