HISAT2 crash/failure

Hi all,

I have been using HISAT2 for RNAseq reads alignment for quite a while.
Last few days, the tool keeps failing with different error messages, even on inputs that I have used successfully before.
Here a couple of examples of the error messages that I get:

This job failed for reasons that could not be determined

(ERR): hisat2-align died with signal 9 (KILL)
[W::sam_read1_sam] Parse error at line 34126880
samtools sort: truncated file. Aborting
[main_samview] fail to read the header from ā€œ-ā€

This job was terminated because it used more memory than it was allocated

Can anyone help?

Thanks
Luca

Hi @LUCA_MOLOGNI

The clusters that run this tool have been very busy but these errors look like they are resulting from a job that is too large to execute. I understand that these reads mapped before, but if any of these other items changed, it could impact how ā€œlargeā€ a job is:

  1. parameters
  2. reference genome
  3. reference annotation
  4. tool version (always use the most current)

Things to troubleshoot:

  1. Try another rerun to eliminate server side transient issues
  2. Inspect the reads ā€“ do these still pass QA tools?
  3. Inspect the target genome if using a custom genome input ā€“ is the fasta format correct?
  4. The format/content of any included reference annotation should also be checked, but usually doesnā€™t lead to an error: the reference annotation would in practical use be ignored instead. So, scientific not technical problems.

I added some tags that link to prior Q&A about those items. In short, this root problem could be input issues OR the job is actually too large to process. The middle message suggests that the target reference genome (fasta) contains a lot of sequences ā€“ remember that this tool is expecting a somewhat intact assembly: up to ~ 1000 target sequences but no more.

Hi Jennifer,
thank you for your kind reply.
Reference is human hg38 and annotation file is the same Iā€™ve always used.
QC is fine.
The only thing thatā€™s changed is the amount of available memory quota on my account.

So I will try and delete some old BAM, FastQ and all large filesā€¦ letā€™s see if this helps.
Thank you

Luca

2 Likes

Did you ever find a solution for this problem? I have the same issue and I cant get around itā€¦

Not reallyā€¦ I switched to STAR aligner

1 Like

Hi Luca,
would you mind sharing your next steps?
I also end up using STARā€¦ and I set the parameters as they say like:
ā€¢ select XS as a Read alignment tag to include in the BAM output if (and only if) your sequenced reads come from an unstranded library prep
ā€¢ not select the jM and jI tags for inclusion
ā€¢ keep the HI tag selected and
ā€¢ select HI tag values should be zero-based
ā€¢ exclude All alignments across non-canonical junctions under Output filter criteria ā†’ Exclude the following records from the BAM output

Then with the mapped.bam file, I used FeatureCounts, and I used an annotation file that I downloaded from encode (in my case mm10). that gave me 2 output files, counts and summary file for DEseq2.

is this how you quantified your mapped bam as well?
This is the first time I use this approach since before I used to do HISAT2 and Stringtie/StringtieMerge/StringTie, but stringtie did not like my RNA STAR bam files for some reasonsā€¦ so I had to use featureCountsā€¦

i would appreciate your reply once again! at least I know if I am doing it right :slight_smile:

Anna

For anyone getting an error message like this one:

This job was terminated because it used more memory than it was allocated.

The problem is unrelated to the amount of available data storage space in an account (data e.g. storage memory quota) and instead related to the memory allocated to the tool during job execution (working memory on a cluster node).

See ā†’ FAQ: Understanding 'exceeds memory allocation' error messages

The issue is usually some problem with the input files or the parameters used when working at a public Galaxy server. Why? The resources are massive at public sites, so if a job failed with this reason, it would probably fail anywhere. Meaning, the tool is spinning out and wouldnā€™t produce meaningful results until the input problem is resolved during a rerun.

We can troubleshoot input problems at this forum. See these guides for how to share your work. That can be screenshots, but all the parts involved need to be captured: input data labels, the input file content (headers and at least one data line), the exact tool used and parameters (all is on the job info page), any logs that are available, and the public server involved.

This is one example of a prior Q&A where this message was involved, and it turned out to be a parameter problem ā†’ Usage help -- single end versus paired end BAM options

Hope that helps! :slight_smile:

Dear Anna,

I used STAR as it is already set by default on the Galaxy website, selecting the reference genome (in my case, hg38);
ā€˜Length of the genomic sequence around annotated junctionsā€™ = 36and as an output, ā€œPer gene read counts (GeneCounts)ā€
I also specify the ā€œGene model (gff3,gtf) file for splice junctionsā€ indicating the annotation (.gtf) file

1 Like