HISAT2 errors meaning

Do you know the meaning of the following error messages from HISAT2 after mapping?

Error, fewer reads in file specified with -2 than in file specified with -1
terminate called after throwing an instance of ‘int’
(ERR): hisat2-align died with signal 6 (ABRT)
[bam_sort_core] merging from 6 files and 1 in-memory blocks…

Especially the “(ERR)” one, I’m afraid my alignment might have not been completed at all.

1 Like

Hi Mario

This is paired end data, isn’t-t? And you have provided two fastq files…, well, the alignment process stop (i.e.: died) because of the different number of reads in the two fastq files. Maybe one file is corrupt (e.g.: incomplete download) or something went wrong during the pre-processing?
Check the integrity of the files with FASTQC

Regards, Hans-Rudolf

4 Likes

Thanks! I checked the files and they were indeed corrupt.
Now I have a normal alignment.

I am having the same problem. How should I check if files are corrupt or not? How should I correct them to have a descent alignment? Would you help me please?

1 Like

Hi @aaak

First, try at least one rerun.

Then start troubleshooting with these Tutorials:
Galaxy Training!
Galaxy Training!

And FAQs:
Galaxy Support - Galaxy Community Hub

You can also search this forum by keywords like tool names or datatypes.

As hrhotz indicated, the reads in two fastq files are not in the same length in the paired-end data after filtering data with “filter by quality” tool. “Reruning” has not led to any solution. What I did not understand from the thread is what is meant “corrupt file” and how to fix it.

I am familiar with all tutorials given in the links. I revised them quickly and to be honest, No idea occurred to me regarding the problem.

I used SRA files and did not intervene into the any files except for running in turn “fastp + filter by quality” tools . I think something happens after using these two tools… Anyways thanks for your prompt response and help. I will not feed input files, obtained by “filter by quality” tool to Hisat2, since its single or combined usage(i.e, with other tools) creates the same problem…

1 Like

Fastp will not sort reads that are still paired after QA from those that are not. Trimmomatic will.

Run the post fastp QA reads through the Fastq-interlacer then Deinterlacer tool. That will remove any reads that are not still paired. The tools in the Seqtk group also include similar functions.

More help for NBCI SRA data. The format downloaded can differ by accession, but this help specifically explains and covers how to solve the most common issues: NCBI SRA Fastq

2 Likes

To be clear, the steps to ensure that all reads have intact pairs should happen after all other QA actions that edit sequences directly and/or filter reads out entirely.

Run tools like FastQC at any time. The reads are not changed with this tool. It is used to generate statistical information only.

there is a bunch of seqtk tools and I used “seqtk trim FASTQ using the Phred algorithm”. It works ok though there is not too much change on the quality. It is probably because I used the default values. I should fiddle to learn more maybe… anyways. Thank you so much for your help…

PS. The trimming is the major issue based on my experience. Some datasets are hell in terms of the quality and since no adequate meta data was given and the protocol is hardly defined, I personally not sure how well I step forward . It would be great to have a tutorial about processing such datasets.