HISAT2 crash/failure

LUCA_MOLOGNI · January 11, 2023, 8:42am

Hi all,

I have been using HISAT2 for RNAseq reads alignment for quite a while.
Last few days, the tool keeps failing with different error messages, even on inputs that I have used successfully before.
Here a couple of examples of the error messages that I get:

This job failed for reasons that could not be determined

(ERR): hisat2-align died with signal 9 (KILL)
[W::sam_read1_sam] Parse error at line 34126880
samtools sort: truncated file. Aborting
[main_samview] fail to read the header from “-”

This job was terminated because it used more memory than it was allocated

Can anyone help?

Thanks
Luca

jennaj · January 12, 2023, 12:19am

Hi @LUCA_MOLOGNI

The clusters that run this tool have been very busy but these errors look like they are resulting from a job that is too large to execute. I understand that these reads mapped before, but if any of these other items changed, it could impact how “large” a job is:

parameters
reference genome
reference annotation
tool version (always use the most current)

Things to troubleshoot:

Try another rerun to eliminate server side transient issues
Inspect the reads – do these still pass QA tools?
Inspect the target genome if using a custom genome input – is the fasta format correct?
The format/content of any included reference annotation should also be checked, but usually doesn’t lead to an error: the reference annotation would in practical use be ignored instead. So, scientific not technical problems.

I added some tags that link to prior Q&A about those items. In short, this root problem could be input issues OR the job is actually too large to process. The middle message suggests that the target reference genome (fasta) contains a lot of sequences – remember that this tool is expecting a somewhat intact assembly: up to ~ 1000 target sequences but no more.

LUCA_MOLOGNI · January 13, 2023, 4:13pm

Hi Jennifer,
thank you for your kind reply.
Reference is human hg38 and annotation file is the same I’ve always used.
QC is fine.
The only thing that’s changed is the amount of available memory quota on my account.

So I will try and delete some old BAM, FastQ and all large files… let’s see if this helps.
Thank you

Luca

Anna_Bianchi · March 19, 2024, 3:28am

Did you ever find a solution for this problem? I have the same issue and I cant get around it…

LUCA_MOLOGNI · March 19, 2024, 7:19am

Not really… I switched to STAR aligner

Anna_Bianchi · March 19, 2024, 1:08pm

Hi Luca,
would you mind sharing your next steps?
I also end up using STAR… and I set the parameters as they say like:
• select XS as a Read alignment tag to include in the BAM output if (and only if) your sequenced reads come from an unstranded library prep
• not select the jM and jI tags for inclusion
• keep the HI tag selected and
• select HI tag values should be zero-based
• exclude All alignments across non-canonical junctions under Output filter criteria → Exclude the following records from the BAM output

Then with the mapped.bam file, I used FeatureCounts, and I used an annotation file that I downloaded from encode (in my case mm10). that gave me 2 output files, counts and summary file for DEseq2.

is this how you quantified your mapped bam as well?
This is the first time I use this approach since before I used to do HISAT2 and Stringtie/StringtieMerge/StringTie, but stringtie did not like my RNA STAR bam files for some reasons… so I had to use featureCounts…

i would appreciate your reply once again! at least I know if I am doing it right

Anna

jennaj · March 19, 2024, 10:13pm

For anyone getting an error message like this one:

This job was terminated because it used more memory than it was allocated.

The problem is unrelated to the amount of available data storage space in an account (data e.g. storage memory quota) and instead related to the memory allocated to the tool during job execution (working memory on a cluster node).

See → FAQ: Understanding 'exceeds memory allocation' error messages

The issue is usually some problem with the input files or the parameters used when working at a public Galaxy server. Why? The resources are massive at public sites, so if a job failed with this reason, it would probably fail anywhere. Meaning, the tool is spinning out and wouldn’t produce meaningful results until the input problem is resolved during a rerun.

We can troubleshoot input problems at this forum. See these guides for how to share your work. That can be screenshots, but all the parts involved need to be captured: input data labels, the input file content (headers and at least one data line), the exact tool used and parameters (all is on the job info page), any logs that are available, and the public server involved.

This is one example of a prior Q&A where this message was involved, and it turned out to be a parameter problem → Usage help -- single end versus paired end BAM options

Hope that helps!

LUCA_MOLOGNI · March 20, 2024, 11:44am

Dear Anna,

I used STAR as it is already set by default on the Galaxy website, selecting the reference genome (in my case, hg38);
‘Length of the genomic sequence around annotated junctions’ = 36and as an output, “Per gene read counts (GeneCounts)”
I also specify the “Gene model (gff3,gtf) file for splice junctions” indicating the annotation (.gtf) file

Topic		Replies	Views
HISAT2 job killed due to not enough memory allocated for the job usegalaxy.org support mapping , exceeds-memory-error	5	624	March 19, 2024
HISAT2 output error mapping , tool-help , hisat2	3	112	October 21, 2024
Issue with Tutorial: De novo transcriptome reconstruction with RNA-Seq - HISAT2 mapping failures usegalaxy.org support troubleshooting , mapping	5	913	October 4, 2019
Error with HISAT2 usegalaxy.org support transcriptomics	3	16	April 3, 2025
Problems mapping with HISAT2 & BOWTIE2 usegalaxy.org support mapping	4	2858	February 28, 2019

HISAT2 crash/failure

Related topics