Picard MergeSamFiles: Fatal error: Exit code 1 () merge SAM

bam
input
picard
mergesamfiles
sam

#1

Hi, When I try to merge filtered SAM files I keep getting this error:
Fatal error: Exit code 1 ()
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/jobdir/022/610/22610166/_job_tmp -Xmx7g -Xms256m
09:32:21.483 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/cvmfs/main.galaxyproject.org/d

Any tip on how to solve it will be greatly appreciated
thank you!


#2

Hi,

First, try at least one rerun. There were some server issues at Galaxy Main https://usegalaxy.org this morning. This may have impacted some jobs in odd/unpredictable ways.

If the job fails again now, then there is an input problem … Picard is very picky about input formats, including the MergeSamFiles tool. Use “lenient” validation for all Picard tools unless there is a specific reason not to.

Some tools that can help to find formatting problems:

  • CleanSam perform SAM/BAM grooming
  • ValidateSamFile assess validity of SAM/BAM dataset (warning: can produce a LOT of output that can be confusing to act on, and some doesn’t need to be acted on – but still worth a try!)

Common reasons for problems:

  • Sort Problems

    • Inputs are not coordinate sorted, and the form was not set to indicate that state
    • Inputs are not coordinate sorted, and the form was set to indicate that state, but sorting fails for some reason
    • Inputs are coordinate sorted, but not sorted in a way that Picard understands
    • Solution: Pre-coordinate sort the inputs with Picard’s SortSam sort SAM/BAM dataset
  • Header Problems

    • SAM datasets do not have headers
    • Some SAM datasets have a header, some do not
    • The headers are different between the SAM datasets
    • The SAM header lines do not match the data lines (can happen if replaced/manipulated in an upstream step)
    • Solution: The tool ReplaceSamHeader replace header in a SAM/BAM dataset is sometimes appropriate, especially if the data was run through some filter step that removed the original header.
  • Mismatched Sequence Dictionaries (related to header issues)

    • If you are using the option “Merge the sequence dictionaries of the datasets being merged” as “True”, more could be going on.
    • This includes missing headers.
    • Solution: Replace/add header back first with ReplaceSamHeader if needed. Then try running: SortSam > CleanSam > MergeSamFiles
  • Very large Sequence Dictionaries (related to header issues)

    • If the target database the data was mapped against has thousands of “chromosomes” or “transcripts”, tools can run out of memory or fail for odd reasons.
    • Often comes up when an NGS read or highly fragmented assembly is used as a mapping target.
    • Solution: There isn’t a “one size fits all” way around this except to reduce the number of target sequences in the mapping target. Options include: filtering the custom genome fasta by read length, only using primary chromosomes (remove unplaced/haplotypes), or possibly read assembly (if the mapping used an unassembled NGS read dataset as the target).

Most of this help is deprecated when using BAM data, but could still apply for SAM data. Tool error? Try Sorting Your Inputs

Give those options a try and see if it resolves the problems. If not, let us know and we can troubleshoot more. Keep your tests in the history (don’t delete them) if you want more review/feedback.

Thanks!


#3

thank you so much for your reply, I think I figured the problem is that I didn’t assign readgroups before the alignment. I know there is a tool to do it in Picard that is the AddOrReplaceReadGroups but I am not sure how to use it. Is there any tutorial
?


#4

I don’t think that read groups are required by this tool, but you could try adding those in to see if it fixes the problem anyway. Read groups can be important for many downstream analysis tools.

See this tutorial for the how-to (includes a video): https://galaxyproject.org/learn/galaxy-ngs101/#read-groups

Small note: We changed how the “Platform unit (PU)” is assigned on the tool form, but that fix is still pending application to Galaxy Main (details). So, be sure to enter a value on the form, it is a required value and will not “auto-fill” with a default term yet (default will be “run” once the tool is updated, but you can assign your own term). If “PU” is left unassigned, the tool will error.