Building an indexed genome file for GATK tools

jsalem · September 28, 2020, 7:49pm

Hi,

Does anyone have any pointers for indexing a reference sequence for use with GATK tools? I have indexed reference sequences that I use for BWA and minimap2, but I can’t get the reference sequences to appear with GATK. I receive an error when I use the “GATK-sorted picard indexes builder”.

Thanks!

jennaj · September 30, 2020, 8:43pm

Hi @jsalem

Nearly all GATK tools, including the Data Manager, as wrapped for Galaxy, are considered deprecated. These were all based on earlier releases of GATK.

Currently, the only GATK4 tool that has been wrapped is:

GATK4 Mutect2 - Call somatic SNVs and indels via local assembly of haplotypes
ToolShed Repository: Galaxy | Tool Shed

For native genome fasta files to be accessible to this tool (option: Choose the source for the reference list) – adding the genome with the fetch DM is probably enough, although to really make new genomes useful there is a short-list of core recommended indexes. All have Data Mangers. You can certainly run more DMs after those (GATK4 Mutect2 won’t need more… but other tools can). See the topic below for help – the same process applies to all genomes you plan to index, not just the one referenced in that particular Q&A.

jsalem · October 1, 2020, 12:43pm

Hi @jennaj,

Thanks for getting back to me. I was hoping to use GATK indel realigner, which requires a reference sequence. Do you know of any other good realigner tools?

jennaj · October 1, 2020, 7:15pm

Hi – Review the LoFreq tools

BJWiley233 · January 18, 2021, 5:57am

I am using GATK4 Mutect2 and I select a cached reference and every time I get the error “A USER ERROR has occurred: Argument reference was missing: Argument ‘reference’ is required.” with no reference passed to -R argument.

jennaj · January 18, 2021, 9:28pm

Hi @BJWiley233

Thanks for sending in the bug report, it helped to spot the problem.

The reads were mapped to hg18, but the genome selected on the GATK4 Mutect2 form was hg19. Your other errored jobs (including Freebayes) also have this conflict.

It is important to use the same reference genome throughout an analysis.

BJWiley233 · January 18, 2021, 9:55pm

Yes I think I selected hg18 my accident. I was wondering why the chom.sizes didn’t match. Rookie mistake.

jennaj · January 18, 2021, 10:19pm

Glad we could help

Everyone does some version of this kind of mixup. Mismatched inputs are one of the first things to check, along with format, whenever errors come up. Often much easier to spot in other people’s work than your own.

BJWiley233 · January 19, 2021, 1:02am

Hi Jenna,
After realigning to hg19 the calls worked with freebayes but mutect2 is still missing the --reference flag when selecting from locally cached. I send the bug report.
Brian

mitchellgc · October 25, 2021, 1:43pm

Same.

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/corral4/main/jobs/038/551/38551877/_job_tmp -Xmx7g -Xms256m
13:40:56.325 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/gatk4-4.1.7.0-py38_0/share/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Oct 25, 2021 1:40:57 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:40:57.841 INFO GetSampleName - ------------------------------------------------------------
13:40:57.841 INFO GetSampleName - The Genome Analysis Toolkit (GATK) v4.1.7.0
13:40:57.841 INFO GetSampleName - For support and documentation go to https://software.broadinstitute.org/gatk/
13:40:57.842 INFO GetSampleName - Executing as g2main@galaxy-28.novalocal on Linux v3.10.0-1127.8.2.el7.x86_64 amd64
13:40:57.842 INFO GetSampleName - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_192-b01
13:40:57.842 INFO GetSampleName - Start Date/Time: October 25, 2021 1:40:55 PM UTC
13:40:57.842 INFO GetSampleName - ------------------------------------------------------------
13:40:57.842 INFO GetSampleName - ------------------------------------------------------------
13:40:57.843 INFO GetSampleName - HTSJDK Version: 2.21.2
13:40:57.843 INFO GetSampleName - Picard Version: 2.21.9
13:40:57.843 INFO GetSampleName - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:40:57.843 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:40:57.843 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:40:57.843 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:40:57.843 INFO GetSampleName - Deflater: IntelDeflater
13:40:57.843 INFO GetSampleName - Inflater: IntelInflater
13:40:57.843 INFO GetSampleName - GCS max retries/reopens: 20
13:40:57.843 INFO GetSampleName - Requester pays: disabled
13:40:57.843 INFO GetSampleName - Initializing engine
13:40:58.847 INFO GetSampleName - Done initializing engine
13:40:58.900 INFO ProgressMeter - Starting traversal
13:40:58.901 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute
13:40:58.903 INFO ProgressMeter - unmapped 0.0 0 NaN
13:40:58.903 INFO ProgressMeter - Traversal complete. Processed 0 total records in 0.0 minutes.
13:40:58.903 INFO GetSampleName - Shutting down engine
[October 25, 2021 1:40:58 PM UTC] org.broadinstitute.hellbender.tools.GetSampleName done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=416284672
Using GATK jar /cvmfs/main.galaxyproject.org/deps/_conda/envs/mulled-v1-2d71e7294258e0d6557ab09c36c8336300de41c90ae1e21e30396938913db7d4/share/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /cvmfs/main.galaxyproject.org/deps/_conda/envs/mulled-v1-2d71e7294258e0d6557ab09c36c8336300de41c90ae1e21e30396938913db7d4/share/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar GetSampleName --input=tumor.bam --output=samplename.txt
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/corral4/main/jobs/038/551/38551877/_job_tmp -Xmx7g -Xms256m
USAGE: Mutect2 [arguments]

Call somatic SNVs and indels via local assembly of haplotypes
Version:4.1.7.0

Required Arguments:

–input,-I:String BAM/SAM/CRAM file containing reads This argument must be specified at least once.
Required.

–output,-O:File File to which variants should be written Required.

–reference,-R:GATKPathSpecifier
Reference sequence file Required.

Optional Arguments:

–add-output-sam-program-record,-add-output-sam-program-record:Boolean
If true, adds a PG tag to created SAM/BAM/CRAM files. Default value: true. Possible
values: {true, false}

–add-output-vcf-command-line,-add-output-vcf-command-line:Boolean
If true, adds a command line header line to created VCF files. Default value: true.
Possible values: {true, false}

–af-of-alleles-not-in-resource,-default-af:Double
Population allele fraction assigned to alleles not found in germline resource. Please see
docs/mutect/mutect2.pdf fora derivation of the default value. Default value: -1.0.

–alleles:FeatureInput The set of alleles to force-call regardless of evidence Default value: null.

–annotation,-A:String One or more specific annotations to add to variant calls This argument may be specified 0
or more times. Default value: null. Possible Values: {AlleleFraction,
AS_BaseQualityRankSumTest, AS_FisherStrand, AS_InbreedingCoeff,
AS_MappingQualityRankSumTest, AS_QualByDepth, AS_ReadPosRankSumTest, AS_RMSMappingQuality,
AS_StrandBiasMutectAnnotation, AS_StrandOddsRatio, BaseQuality, BaseQualityHistogram,
BaseQualityRankSumTest, ChromosomeCounts, ClippingRankSumTest, CountNs, Coverage,
DepthPerAlleleBySample, DepthPerSampleHC, ExcessHet, FisherStrand, FragmentLength,
GenotypeSummaries, InbreedingCoeff, LikelihoodRankSumTest, MappingQuality,
MappingQualityRankSumTest, MappingQualityZero, OrientationBiasReadCounts,
OriginalAlignment, PossibleDeNovo, QualByDepth, ReadPosition, ReadPosRankSumTest,
ReferenceBases, RMSMappingQuality, SampleList, StrandBiasBySample, StrandOddsRatio,
TandemRepeat, UniqueAltReadCount}

–annotation-group,-G:String One or more groups of annotations to apply to variant calls This argument may be
specified 0 or more times. Default value: null. Possible Values:
{AlleleSpecificAnnotation, AS_StandardAnnotation, ReducibleAnnotation, StandardAnnotation,
StandardHCAnnotation, StandardMutectAnnotation}

–annotations-to-exclude,-AX:String
One or more specific annotations to exclude from variant calls This argument may be
specified 0 or more times. Default value: null. Possible Values:
{AS_StrandBiasMutectAnnotation, BaseQuality, Coverage, DepthPerAlleleBySample,
DepthPerSampleHC, FragmentLength, MappingQuality, OrientationBiasReadCounts, ReadPosition,
StrandBiasBySample, TandemRepeat}

–arguments_file:File read one or more arguments files and add them to the command line This argument may be
specified 0 or more times. Default value: null.

–assembly-region-out:String Output the assembly region to this IGV formatted file Default value: null.

–assembly-region-padding:Integer
Number of additional bases of context to include around each assembly region Default
value: 100.

–base-quality-score-threshold:Byte
Base qualities below this threshold will be reduced to the minimum (6) Default value: 18.

–callable-depth:Integer Minimum depth to be considered callable for Mutect stats. Does not affect genotyping.
Default value: 10.

–cloud-index-prefetch-buffer,-CIPB:Integer
Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to
cloudPrefetchBuffer if unset. Default value: -1.

–cloud-prefetch-buffer,-CPB:Integer
Size of the cloud-only prefetch buffer (in MB; 0 to disable). Default value: 40.

–create-output-bam-index,-OBI:Boolean
If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. Default
value: true. Possible values: {true, false}

–create-output-bam-md5,-OBM:Boolean
If true, create a MD5 digest for any BAM/SAM/CRAM file created Default value: false.
Possible values: {true, false}

–create-output-variant-index,-OVI:Boolean
If true, create a VCF index when writing a coordinate-sorted VCF file. Default value:
true. Possible values: {true, false}

–create-output-variant-md5,-OVM:Boolean
If true, create a a MD5 digest any VCF file created. Default value: false. Possible
values: {true, false}

–disable-bam-index-caching,-DBIC:Boolean
If true, don’t cache bam indexes, this will reduce memory requirements but may harm
performance if many intervals are specified. Caching is automatically disabled if there
are no intervals specified. Default value: false. Possible values: {true, false}

–disable-read-filter,-DF:String
Read filters to be disabled before analysis This argument may be specified 0 or more
times. Default value: null. Possible Values: {GoodCigarReadFilter, MappedReadFilter,
MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter,
MappingQualityReadFilter, NonChimericOriginalAlignmentReadFilter,
NonZeroReferenceLengthAlignmentReadFilter, NotDuplicateReadFilter,
NotSecondaryAlignmentReadFilter, PassesVendorQualityCheckReadFilter, ReadLengthReadFilter,
WellformedReadFilter}

–disable-sequence-dictionary-validation,-disable-sequence-dictionary-validation:Boolean
If specified, do not check the sequence dictionaries from our inputs for compatibility.
Use at your own risk! Default value: false. Possible values: {true, false}

–downsampling-stride,-stride:Integer
Downsample a pool of reads starting within a range of one or more bases. Default value:
1.

–exclude-intervals,-XL:StringOne or more genomic intervals to exclude from processing This argument may be specified 0
or more times. Default value: null.

–f1r2-max-depth:Integer sites with depth higher than this value will be grouped Default value: 200.

–f1r2-median-mq:Integer skip sites with median mapping quality below this value Default value: 50.

–f1r2-min-bq:Integer exclude bases below this quality from pileup Default value: 20.

–f1r2-tar-gz:File If specified, collect F1R2 counts and output files into this tar.gz file Default value:
null.

–founder-id,-founder-id:String
Samples representing the population “founders” This argument may be specified 0 or more
times. Default value: null.

–gatk-config-file:String A configuration file to use with the GATK. Default value: null.

–gcs-max-retries,-gcs-retries:Integer
If the GCS bucket channel errors out, how many times it will attempt to re-initiate the
connection Default value: 20.

–gcs-project-for-requester-pays:String
Project to bill when accessing “requester pays” buckets. If unset, these buckets cannot be
accessed. Default value: .

–genotype-germline-sites:Boolean
(EXPERIMENTAL) Call all apparent germline site even though they will ultimately be
filtered. Default value: false. Possible values: {true, false}

–genotype-pon-sites:Boolean Call sites in the PoN even though they will ultimately be filtered. Default value: false.
Possible values: {true, false}

–germline-resource:FeatureInput
Population vcf of germline sequencing containing allele fractions. Default value: null.

–graph-output,-graph:String Write debug assembly graph information to this file Default value: null.

–help,-h:Boolean display the help message Default value: false. Possible values: {true, false}

–ignore-itr-artifacts:BooleanTurn off read transformer that clips artifacts associated with end repair insertions near
inverted tandem repeats. Default value: false. Possible values: {true, false}

–initial-tumor-lod,-init-lod:Double
Log 10 odds threshold to consider pileup active. Default value: 2.0.

–interval-exclusion-padding,-ixp:Integer
Amount of padding (in bp) to add to each interval you are excluding. Default value: 0.

–interval-merging-rule,-imr:IntervalMergingRule
Interval merging rule for abutting intervals Default value: ALL. Possible values: {ALL,
OVERLAPPING_ONLY}

–interval-padding,-ip:IntegerAmount of padding (in bp) to add to each interval you are including. Default value: 0.

–interval-set-rule,-isr:IntervalSetRule
Set merging approach to use for combining interval inputs Default value: UNION. Possible
values: {UNION, INTERSECTION}

–intervals,-L:String One or more genomic intervals over which to operate This argument may be specified 0 or
more times. Default value: null.

–lenient,-LE:Boolean Lenient processing of VCF files Default value: false. Possible values: {true, false}

–max-assembly-region-size:Integer
Maximum size of an assembly region Default value: 300.

–max-population-af,-max-af:Double
Maximum population allele frequency in tumor-only mode. Default value: 0.01.

–max-reads-per-alignment-start:Integer
Maximum number of reads to retain per alignment start position. Reads above this threshold
will be downsampled. Set to 0 to disable. Default value: 50.

–min-assembly-region-size:Integer
Minimum size of an assembly region Default value: 50.

–min-base-quality-score,-mbq:Byte
Minimum base quality required to consider a base for calling Default value: 10.

–mitochondria-mode:Boolean Mitochondria mode sets emission and initial LODs to 0. Default value: false. Possible
values: {true, false}

–native-pair-hmm-threads:Integer
How many threads should a native pairHMM implementation use Default value: 4.

–native-pair-hmm-use-double-precision:Boolean
use double precision in the native pairHmm. This is slower but matches the java
implementation better Default value: false. Possible values: {true, false}

–normal-lod:Double Log 10 odds threshold for calling normal variant non-germline. Default value: 2.2.

–normal-sample,-normal:StringBAM sample name of normal(s), if any. May be URL-encoded as output by GetSampleName with
-encode argument. This argument may be specified 0 or more times. Default value: null.

–panel-of-normals,-pon:FeatureInput
VCF file of sites observed in normal. Default value: null.

–pcr-indel-qual:Integer Phred-scaled PCR SNV qual for overlapping fragments Default value: 40.

–pcr-snv-qual:Integer Phred-scaled PCR SNV qual for overlapping fragments Default value: 40.

–pedigree,-ped:File Pedigree file for determining the population “founders” Default value: null.

–QUIET:Boolean Whether to suppress job-summary info on System.err. Default value: false. Possible
values: {true, false}

–read-filter,-RF:String Read filters to be applied before analysis This argument may be specified 0 or more
times. Default value: null. Possible Values: {AlignmentAgreesWithHeaderReadFilter,
AllowAllReadsReadFilter, AmbiguousBaseReadFilter, CigarContainsNoNOperator,
FirstOfPairReadFilter, FragmentLengthReadFilter, GoodCigarReadFilter,
HasReadGroupReadFilter, IntervalOverlapReadFilter, LibraryReadFilter, MappedReadFilter,
MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter,
MappingQualityReadFilter, MatchingBasesAndQualsReadFilter, MateDifferentStrandReadFilter,
MateDistantReadFilter, MateOnSameContigOrNoMappedMateReadFilter,
MateUnmappedAndUnmappedReadFilter, MetricsReadFilter,
NonChimericOriginalAlignmentReadFilter, NonZeroFragmentLengthReadFilter,
NonZeroReferenceLengthAlignmentReadFilter, NotDuplicateReadFilter,
NotOpticalDuplicateReadFilter, NotProperlyPairedReadFilter,
NotSecondaryAlignmentReadFilter, NotSupplementaryAlignmentReadFilter,
OverclippedReadFilter, PairedReadFilter, PassesVendorQualityCheckReadFilter,
PlatformReadFilter, PlatformUnitReadFilter, PrimaryLineReadFilter,
ProperlyPairedReadFilter, ReadGroupBlackListReadFilter, ReadGroupReadFilter,
ReadLengthEqualsCigarLengthReadFilter, ReadLengthReadFilter, ReadNameReadFilter,
ReadStrandFilter, SampleReadFilter, SecondOfPairReadFilter, SeqIsStoredReadFilter,
SoftClippedReadFilter, ValidAlignmentEndReadFilter, ValidAlignmentStartReadFilter,
WellformedReadFilter}

–read-index,-read-index:String
Indices to use for the read inputs. If specified, an index must be provided for every read
input and in the same order as the read inputs. If this argument is not specified, the
path to the index for each input will be inferred automatically. This argument may be
specified 0 or more times. Default value: null.

–read-validation-stringency,-VS:ValidationStringency
Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default
stringency value SILENT can improve performance when processing a BAM file in which
variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default
value: SILENT. Possible values: {STRICT, LENIENT, SILENT}

–seconds-between-progress-updates,-seconds-between-progress-updates:Double
Output traversal statistics every time this many seconds elapse Default value: 10.0.

–sequence-dictionary,-sequence-dictionary:String
Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a
.dict file. Default value: null.

–sites-only-vcf-output:Boolean
If true, don’t emit genotype fields when writing vcf file output. Default value: false.
Possible values: {true, false}

–tmp-dir:GATKPathSpecifier Temp directory to use. Default value: null.

–tumor-lod-to-emit,-emit-lod:Double
Log 10 odds threshold to emit variant to VCF. Default value: 3.0.

–tumor-sample,-tumor:String BAM sample name of tumor. May be URL-encoded as output by GetSampleName with -encode
argument. Default value: null.

–use-jdk-deflater,-jdk-deflater:Boolean
Whether to use the JdkDeflater (as opposed to IntelDeflater) Default value: false.
Possible values: {true, false}

–use-jdk-inflater,-jdk-inflater:Boolean
Whether to use the JdkInflater (as opposed to IntelInflater) Default value: false.
Possible values: {true, false}

–verbosity,-verbosity:LogLevel
Control verbosity of logging. Default value: INFO. Possible values: {ERROR, WARNING,
INFO, DEBUG}

–version:Boolean display the version number for this tool Default value: false. Possible values: {true,
false}

Advanced Arguments:

–active-probability-threshold:Double
Minimum probability for a locus to be considered active. Default value: 0.002.

–adaptive-pruning-initial-error-rate:Double
Initial base error rate estimate for adaptive pruning Default value: 0.001.

–allele-informative-reads-overlap-margin:Integer
Likelihood and read-based annotations will only take into consideration reads that overlap
the variant or any base no further than this distance expressed in base pairs Default
value: 2.

–allow-non-unique-kmers-in-ref:Boolean
Allow graphs that have non-unique kmers in the reference Default value: false. Possible
values: {true, false}

–bam-output,-bamout:String File to which assembled haplotypes should be written Default value: null.

–bam-writer-type:WriterType Which haplotypes should be written to the BAM Default value: CALLED_HAPLOTYPES. Possible
values: {ALL_POSSIBLE_HAPLOTYPES, CALLED_HAPLOTYPES}

–debug-assembly,-debug:Boolean
Print out verbose debug information about each assembly region Default value: false.
Possible values: {true, false}

–disable-adaptive-pruning:Boolean
Disable the adaptive algorithm for pruning paths in the graph Default value: false.
Possible values: {true, false}

–disable-tool-default-annotations,-disable-tool-default-annotations:Boolean
Disable all tool default annotations Default value: false. Possible values: {true, false}

–disable-tool-default-read-filters,-disable-tool-default-read-filters:Boolean
Disable all tool default read filters (WARNING: many tools will not function correctly
without their default read filters on) Default value: false. Possible values: {true,
false}

–dont-increase-kmer-sizes-for-cycles:Boolean
Disable iterating over kmer sizes when graph cycles are detected Default value: false.
Possible values: {true, false}

–dont-use-soft-clipped-bases:Boolean
Do not analyze soft clipped bases in the reads Default value: false. Possible values:
{true, false}

–emit-ref-confidence,-ERC:ReferenceConfidenceMode
Mode for emitting reference confidence scores (For Mutect2, this is a BETA feature)
Default value: NONE. Possible values: {NONE, BP_RESOLUTION, GVCF}

–enable-all-annotations:Boolean
Use all possible annotations (not for the faint of heart) Default value: false. Possible
values: {true, false}

–force-active:Boolean If provided, all regions will be marked as active Default value: false. Possible values:
{true, false}

–force-call-filtered-alleles,-genotype-filtered-alleles:Boolean
Force-call filtered alleles included in the resource specified by --alleles Default
value: false. Possible values: {true, false}

–gvcf-lod-band,-LODB:Double Exclusive upper bounds for reference confidence LOD bands (must be specified in increasing
order) This argument may be specified 0 or more times. Default value: [-2.5, -2.0, -1.5,
-1.0, -0.5, 0.0, 0.5, 1.0].

–independent-mates:Boolean Allow paired reads to independently support different haplotypes. Useful for validations
with ill-designed synthetic data. Default value: false. Possible values: {true, false}

–kmer-size:Integer Kmer size to use in the read threading assembler This argument may be specified 0 or more
times. Default value: [10, 25].

–max-mnp-distance,-mnp-dist:Integer
Two or more phased substitutions separated by this distance or less are merged into MNPs.
Default value: 1.

–max-num-haplotypes-in-population:Integer
Maximum number of haplotypes to consider for your population Default value: 128.

–max-prob-propagation-distance:Integer
Upper limit on how many bases away probability mass can be moved around when calculating
the boundaries between active and inactive assembly regions Default value: 50.

–max-suspicious-reads-per-alignment-start:Integer
Maximum number of suspicious reads (mediocre mapping quality or too many substitutions)
allowed in a downsampling stride. Set to 0 to disable. Default value: 0.

–max-unpruned-variants:Integer
Maximum number of variants in graph the adaptive pruner will allow Default value: 100.

–min-dangling-branch-length:Integer
Minimum length of a dangling branch to attempt recovery Default value: 4.

–min-pruning:Integer Minimum support to not prune paths in the graph Default value: 2.

–minimum-allele-fraction,-min-AF:Double
Lower bound of variant allele fractions to consider when calculating variant LOD Default
value: 0.0.

–num-pruning-samples:Integer Number of samples that must pass the minPruning threshold Default value: 1.

–pair-hmm-gap-continuation-penalty:Integer
Flat gap continuation penalty for use in the Pair HMM Default value: 10.

–pair-hmm-implementation,-pairHMM:Implementation
The PairHMM implementation to use for genotype likelihood calculations Default value:
FASTEST_AVAILABLE. Possible values: {EXACT, ORIGINAL, LOGLESS_CACHING,
AVX_LOGLESS_CACHING, AVX_LOGLESS_CACHING_OMP, EXPERIMENTAL_FPGA_LOGLESS_CACHING,
FASTEST_AVAILABLE}

–pcr-indel-model:PCRErrorModel
The PCR indel model to use Default value: CONSERVATIVE. Possible values: {NONE, HOSTILE,
AGGRESSIVE, CONSERVATIVE}

–phred-scaled-global-read-mismapping-rate:Integer
The global assumed mismapping rate for reads Default value: 45.

–pruning-lod-threshold:DoubleLn likelihood ratio threshold for adaptive pruning algorithm Default value:
2.302585092994046.

–recover-all-dangling-branches:Boolean
Recover all dangling branches Default value: false. Possible values: {true, false}

–showHidden,-showHidden:Boolean
display hidden arguments Default value: false. Possible values: {true, false}

–smith-waterman:Implementation
Which Smith-Waterman implementation to use, generally FASTEST_AVAILABLE is the right
choice Default value: JAVA. Possible values: {FASTEST_AVAILABLE, AVX_ENABLED, JAVA}

Conditional Arguments for readFilter:

Valid only if “MappingQualityReadFilter” is specified:
–maximum-mapping-quality:Integer
Maximum mapping quality to keep (inclusive) Default value: null.

–minimum-mapping-quality:Integer
Minimum mapping quality to keep (inclusive) Default value: 20.

Valid only if “ReadLengthReadFilter” is specified:
–max-read-length:Integer Keep only reads with length at most equal to the specified value Default value:
2147483647.

–min-read-length:Integer Keep only reads with length at least equal to the specified value Default value: 30.

A USER ERROR has occurred: Argument reference was missing: Argument ‘reference’ is required.

Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (–java-options ‘-DGATK_STACKTRACE_ON_USER_EXCEPTION=true’) to print the stack trace.
Using GATK jar /cvmfs/main.galaxyproject.org/deps/_conda/envs/mulled-v1-2d71e7294258e0d6557ab09c36c8336300de41c90ae1e21e30396938913db7d4/share/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /cvmfs/main.galaxyproject.org/deps/_conda/envs/mulled-v1-2d71e7294258e0d6557ab09c36c8336300de41c90ae1e21e30396938913db7d4/share/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar Mutect2 --QUIET --tumor-sample Galaxy11-ob__HISAT2_on_97.1.1__cb.bam --input tumor.bam --input normal.bam --output output.vcf.gz

jennaj · October 27, 2021, 12:02am

Do the inputs have the same exact reference genome assigned as was selected on the GATK4 tool form? Which database (genome) for each? It wasn’t added to the command string but should be if specified on the form.

mitchellgc · October 27, 2021, 11:57am

That’s what’s strange. hg19 was selected but wasn’t passed to the command string.

jennaj · October 27, 2021, 7:49pm

Thanks @mitchellgc for the extra feedback.

The version installed at ORG has problems. We are working to fix that now.

Meanwhile, you can run the job at UseGalaxy.eu with this version of the tool. It was updated with some important changes finalized and published on Oct 6, 2021.

GATK4 Mutect2 - Call somatic SNVs and indels via local assembly of haplotypes (Galaxy Version 4.1.7.0+galaxy1)

Any other version of the tool could have problems. This was complicated to fix by the tool-dev group (since January) but all the changes are now incorporated.

Tracking ticket: Update GATK4 Mutect2 · Issue #345 · galaxyproject/usegalaxy-playbook · GitHub

mitchellgc · October 27, 2021, 8:00pm

Thank you!

Topic		Replies	Views
Indexing reference genomes with Data Managers: Resources, tutorials, troubleshooting galaxy-local , data-manager , picard_markduplicates	28	7608	July 7, 2021
Reference Genome in some tools - Fully indexing genomes with Data Managers galaxy-local , data-manager , reference-genome , variant-analysis	3	1312	January 27, 2020
Genome index or dbkey not accessed by tools on a local Galaxy - Solution: Run tool-specific Data Managers usegalaxy.org support server-admin , tool-install , galaxy-local , data-manager	5	960	May 18, 2019
Custom set of reference indexes for transcriptomics processing tool-dev , salmon	2	503	November 2, 2019
Reference genomes "instant fetching" in the tool? galaxy-local , data-manager	3	756	June 27, 2019

Building an indexed genome file for GATK tools

Related topics