Hisat2 Error on Bowtie2 output

Hisat2 error won’t run on Bowtie2 output. Not sure what the issue is??

Hi @Jon_Colman

The job parameters seem to be creating an intermediate output that is exceeding the temporary data storage space on the cluster node processing the job (different from storage in your account). I know that you are looking for weaker matches, but if the match criteria is overly non-specific, this can happen (with any mapping or comparison tool!).

As a guess, setting Ambiguous read penalty to 0 is at least contributing to the situation. Reference genomes have many gap N characters! Attempting to bridge over these in really permissive ways could certainly cause the hashing data to grow before it has a chance to have the other filters applied.

You could try changing parameters or breaking up your query into smaller chunks to see if either of those help. Maybe you can isolate the reads leading to the ramp up, and eliminate them while you investigate the others. You could also try at UseGalaxy.eu since their clusters allocate/scale resources in a different way but I think you’ll need to still experiment with parameters/query size to find a balance.

Hope this helps! :slight_smile:

I think I’m getting a better understanding of the issues I’ve been having. I believe part of the problem is some of the species in my samples. I suspect my biggest issue is with Adapter Removal. I conducted a test using reads that mapped 100% to my reference, which would mean no adapters were present. I didn’t test all, but I tested with FASTP, and it’s removing a significant part of my sequences. I’m currently trying to redo everything yet again!!! I last tried with cutadapt, and it seemed to be better. I’m trying again with Adapterremoval tool on the eu site, so far it appears to be working. What I was finding was that even after removing my sequences of interest with bowtie2 and Hisat2 (both with soft trimming or local), and my best effort to remove all human with same settings, I would still have generally 125mb forward/reverse reads. Then de novo on these, and it would still assemble a significant amount of my reads into what I previously tried to remove. I can only think this was due to residual adapters still present.

My other issue is I have two sample sets that were done on Novaseq 6000 that has a TON of Poly-g contamination. I can remove them pretty well on the 3’ end, but I also have them showing up on the 5’ end. My only thought on those is with cutadapt and assigning GGGGGGGGGG as a 5’ adapter