Building an indexed genome file for GATK tools


Does anyone have any pointers for indexing a reference sequence for use with GATK tools? I have indexed reference sequences that I use for BWA and minimap2, but I can’t get the reference sequences to appear with GATK. I receive an error when I use the “GATK-sorted picard indexes builder”.


1 Like

Hi @jsalem

Nearly all GATK tools, including the Data Manager, as wrapped for Galaxy, are considered deprecated. These were all based on earlier releases of GATK.

Currently, the only GATK4 tool that has been wrapped is:

For native genome fasta files to be accessible to this tool (option: Choose the source for the reference list) – adding the genome with the fetch DM is probably enough, although to really make new genomes useful there is a short-list of core recommended indexes. All have Data Mangers. You can certainly run more DMs after those (GATK4 Mutect2 won’t need more… but other tools can). See the topic below for help – the same process applies to all genomes you plan to index, not just the one referenced in that particular Q&A.

Hi @jennaj,

Thanks for getting back to me. I was hoping to use GATK indel realigner, which requires a reference sequence. Do you know of any other good realigner tools?

1 Like

Hi – Review the LoFreq tools

I am using GATK4 Mutect2 and I select a cached reference and every time I get the error “A USER ERROR has occurred: Argument reference was missing: Argument ‘reference’ is required.” with no reference passed to -R argument.

1 Like

Hi @BJWiley233

Thanks for sending in the bug report, it helped to spot the problem.

The reads were mapped to hg18, but the genome selected on the GATK4 Mutect2 form was hg19. Your other errored jobs (including Freebayes) also have this conflict.

It is important to use the same reference genome throughout an analysis.

Yes I think I selected hg18 my accident. I was wondering why the chom.sizes didn’t match. Rookie mistake.

1 Like

Glad we could help :woman_technologist:

Everyone does some version of this kind of mixup. Mismatched inputs are one of the first things to check, along with format, whenever errors come up. Often much easier to spot in other people’s work than your own.

Hi Jenna,
After realigning to hg19 the calls worked with freebayes but mutect2 is still missing the --reference flag when selecting from locally cached. I send the bug report.