What exactly are the "built-in references" in Galaxy's HISAT2?

Where I can find info on the built-in reference options for HISAT2 in Galaxy? I want to make sure it matches my input for featureCounts, and I aligned using HISAT2’s built-in rat rn6/2014 option.

Essentially, I am curious to know if that built-in reference contained transcript predictions (or not), which will help me interpret what I’m seeing in this data. I just can’t find anything that lists what ref file is the built-in one.

1 Like

fwiw if you are familiar with Galaxy tools the indexes are created using this tool: https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_hisat2_index_builder/data_manager

edit: and you can browse them here: http://datacache.galaxyproject.org/managed/hisat2_index/

1 Like

The reference genome included with HISAT2 is just that – the genome index only. Reference annotation can also be included with HISAT2 (for splice site identification, filtering). See the tool’s advanced options if you want to incorporate annotation during mapping. Or, you can incorporate it with downstream tools (including FeatureCounts).

For reference annotation, you’ll need to provide a gtf dataset from the history that is based on the same genome/build as used for mapping. UCSC’s version of rn6 is what is indexed at most public Galaxy servers (and what @marten shared links to).

This prior Q&A was about human, but the same instructions for getting the rat data from iGenomes will apply in your case, too. Pick the “UCSC rn6” data.

If you want to use another source and compare the chromosome identifiers, it is easy to generate a peek at the contents of a bam header into a summary – try the tool Samtools: IdxStats reports stats of the BAM index file.

Note: Avoid the gtf generated by the UCSC table browser. The “gene_id” and “transcript_id” fields in the 9th attribute field are both populated with the “transcript_id”, effectively resulting in all counts/summaries produced using it to be “by transcript” (not summarized at the gene level).