HISAT2 reports samples with 0% alignment rates

I have about 12 samples I aligned with mm39. All samples had good quality reads and all paired, as trimmomatic gave me empty files for unpaired reads. Strangely, two of them had 0% alignment with the genome. I blasted a few reads and they seem to match mus musculus or other rodents. I’m using HISAT2 on galaxy. Is it a bug?

1 Like

Hi @Diogo_de_Moraes1

Try a rerun with the data that didn’t align, just to eliminate server/cluster issues from being a factor.

As you do that, double-check that the correct target reference genome is selected on the tool form. The most current version of the mouse genome is GRCm38 (mm10, sourced from UCSC). Most mapping tools, including HISAT2, will only have built-in indexes for mm9 or mm10 available (not the earlier genome builds).

Sometimes the wrong genome is selected. HISAT2 is designed to align reads to the same species they were sourced from. BLAST+ is different, and allows for cross-species mappings to be reported.

If that doesn’t work, and the reads are of high quality (tool: FastQC) then something else is going on. Perhaps a sample mixup, or the inputs (forward/reverse) were not entered correctly on the form, or possibly the read content doesn’t meet the minimum mapping criteria set on the HISAT2 tool form.

Example of the last case: too much QA (trimming) can generate truncated reads that won’t meet the minimum default mapping criteria. You’ll need to investigate then make changes depending on what you uncover about the content.

Hope that helps!

1 Like

Hi jennaj

I did use GRCm38, I mistyped.
I tried simply rerunning with the same parameters. Now it aligned perfectly. It might have been a server hiccup, maybe because I set hundreds of jobs to run.

Thank you very much!

1 Like

Hi @Diogo_de_Moraes1

Glad that worked!

A small fraction of jobs is expected to fail, despite best efforts administratively. The server is very busy :slight_smile: At least one rerun is recommended for any job that fails.

A post was split to a new topic: Can I use data based on different genome assemblies?