What exactly are the "built-in references" in Galaxy's HISAT2?

Lauren_Shunkwiler · March 12, 2019, 6:25pm

Where I can find info on the built-in reference options for HISAT2 in Galaxy? I want to make sure it matches my input for featureCounts, and I aligned using HISAT2’s built-in rat rn6/2014 option.

Essentially, I am curious to know if that built-in reference contained transcript predictions (or not), which will help me interpret what I’m seeing in this data. I just can’t find anything that lists what ref file is the built-in one.

marten · March 12, 2019, 6:31pm

fwiw if you are familiar with Galaxy tools the indexes are created using this tool: https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_hisat2_index_builder/data_manager

edit: and you can browse them here: http://datacache.galaxyproject.org/managed/hisat2_index/

jennaj · March 12, 2019, 10:35pm

The reference genome included with HISAT2 is just that – the genome index only. Reference annotation can also be included with HISAT2 (for splice site identification, filtering). See the tool’s advanced options if you want to incorporate annotation during mapping. Or, you can incorporate it with downstream tools (including FeatureCounts).

For reference annotation, you’ll need to provide a gtf dataset from the history that is based on the same genome/build as used for mapping. UCSC’s version of rn6 is what is indexed at most public Galaxy servers (and what @marten shared links to).

This prior Q&A was about human, but the same instructions for getting the rat data from iGenomes will apply in your case, too. Pick the “UCSC rn6” data.

If you want to use another source and compare the chromosome identifiers, it is easy to generate a peek at the contents of a bam header into a summary – try the tool Samtools: IdxStats reports stats of the BAM index file.

Note: Avoid the gtf generated by the UCSC table browser. The “gene_id” and “transcript_id” fields in the 9th attribute field are both populated with the “transcript_id”, effectively resulting in all counts/summaries produced using it to be “by transcript” (not summarized at the gene level).

FAQ:

Mismatched Chromosome identifiers (and how to avoid them)

Topic		Replies	Views
How to add a new reference-genome on HISTAT2? I need S. agalactiae BM110 usegalaxy.eu support reference-genome	5	209	July 1, 2024
hisat2 and featurecounts usegalaxy.org support gtn-tutorial , workflow , galaxy-local , mapping , transcriptomics , featurecounts	23	2059	October 28, 2024
Indexing reference genomes with Data Managers: Resources, tutorials, troubleshooting galaxy-local , data-manager , picard_markduplicates	28	7601	July 7, 2021
Creating a customized genome index (large) on a private Galaxy server -> Use Data Managers server-admin , data-manager	3	531	August 23, 2022
FeatureCounts troubleshooting usegalaxy.org support troubleshooting , transcriptomics , tool-help , featurecounts	4	469	January 29, 2024

What exactly are the "built-in references" in Galaxy's HISAT2?

Related topics