Trying to run Kraken2, I started them yesterday and still not starting. Is something wrong??
Hi @Jon_Colman
I see many Kraken2 jobs running at the EU server today. The clusters that can run this tool seem busy! If yours haven’t processed yet, please be sure to leave the jobs queued so you don’t lose your spot.
Hope this helps! ![]()
Ok, maybe I’m just not understanding how things work. I wasn’t considering that Kraken2 only runs on certain clusters.
A couple other issues. Running Hisat2 on usegalaxy.org periodically is causing some errors, running say bowtie2 after hisat, sometimes throws an error saying different read counts.
On usegalaxy.eu one thing I have noticed is that when running a tool that gives multiple outputs, one or two won’t start. Could be a deinterlace, or compress, etc. If I rerun it when it appears to be not going anywhere, it will usually complete right away.
Hi @Jon_Colman
Yes, each server connects to multiple, large, academic clusters. The nodes have different technical specifications. There are complex rules about which resources a tool needs and which clusters are appropriate. Kraken2 needs a very high memory node for the reference data. These are busy nodes! Some of the jobs take a longer time to process due to the complexity. So, even a small query may queue waiting for a node to free up.
Error about read counts with mapping tool usually mean that the pairs of reads are missing one of the ends. Meaning, one mate end didn’t pass through the upstream step. This is expected when “mapping → more mapping”. The first mapping can lose an end.
Do the errors look like this?
The -2 here means the reverse reads, and -1 means the forward reads.
Certain mapping parameters are strict and both mated ends are required. QA trimming tools can be adjusted to only report intact pairs, but if this happens later on in an analysis, you can filter for intact pairs a few ways. The tools below are one, but also see the Seqtk tool suite for many more complex manipulations related to this.
To check for unmated pairs, this is one tool to use. It also reports issues with compression status, versus the datatype, or just in general (truncated, or corrupted for some reason).
- FASTQ info validates single or paired fastq files
To correct unmated pairs, this usually involves creating an interleaved file, dropping unmatched pairs, then splitting the files out again. You could put these two into a mini workflow for reuse directly, or include it as a subworkflow in another existing workflow. (And, I just saw that you are using this already, but let’s leave it for anyone reading later on!).
- FASTQ interlacer on paired end reads
- then, FASTQ de-interlacer on paired end reads
Then for
Oh, this is curious.
The EU server has an automatic rerun for failures (just one). In the history this can look like a job is running, then it goes back to queued, then finally runs again. The job either happened to hit a bad node, or the original error indicated that the job needs more resources, so it is being dispatched to a larger cluster node.
Running again would start that process over. If the error was by chance, then the new job has a chance to succeed! My immediate guess would be that the compress/uncompress functions are busy. Some tools understand compressed fastq, and other need uncompressed. Galaxy will transform the data if needed (you can see this in the history sometimes) with the implicit conversion functions.
Does this seem to be what is going on? Because this is happening with a few tools it makes be think it is at a lower level, and not the tool. If not, you are welcome to share a history with the error and I’ll review, and bring in EU people as needed. If those transformations have some issue it seems worth looking at. It could be related to the data size or something else.
All of these servers crunch through massive amounts of data! We just ran some statistics and it turns out that the public servers combined process an analysis job somewhere every single second! Tuning this is an ongoing process. ![]()
I think the Hisat2 issue is specific to usegalaxy.org. It would run normally at first, then maybe I use bowtie and go back to hisat again, and I would get the " Error, fewer reads in file specified with -2 than in file specified with -1". or something similar. Sometimes running trimmomatic would fix the issue, not finding any unpaired reads. The last time when I couldn’t get the files to run, I moved them to .eu and they worked completely fine.
Often the run failure issue is on very small files. A small workflow say with BBmerge with overlap correction only, this gives an interleaved fastqsanger data set. So the next step would be de-interleave, then compress. Maybe one file would compress, one wouldn’t. Generally small things, randomly but often. I don’t think it’s the tool itself, it’s like it just drops one of the steps in the workflow. I can share the next time it happens
Hi @Jon_Colman
There are combinations of parameters that will inform the HISAT2 tool to require intact paired end reads. This would be true at any of the servers for the same tool version (as far as I know so far!). However, a different tool version might behave differently.
Would you like to share the compared jobs? I’m concerned about this message being reported while at the same time Trimmomatic doesn’t identify any orphan ends!
Thanks for all the follow up! If there is a bug we want to squash it. ![]()
I will have to watch again for the Hisat2 ones that show mismatch on read counts.
I thought Trimmomatic separates out unpaired reads??? Or is this different than orphan ends?? Like I said, sometimes when hisat2 showed different counts of r1/r2, I would run trimmomatic so that it wouldn’t do any trimming, and it would show only paired reads. This would make it possible to run hisat2 again. I’ve only had the hisat2 issues since some of the recent site updates.
Here is one that failed (job terminated for using too much memory) In the deleted reads # 405-409, I moved this over to .eu and it worked fine.
Thanks @Jon_Colman for sharing the history. The parameter choices are creating a job that consumes a lot of memory so I think the error is correct for this particular case. The EU server was a good choice for an alternative. I grabbed a copy of the example and am running some tests, more soon about this (will process for about an hour). Maybe we can scale more at ORG to handle it.
Then for this
Yes, Trimmomatic will sort the reads: still paired and unpaired split into four files for the forward and reverse cases. Using the tool this way should be fine.
Btw: An unpaired end of a read is sometimes called an “orphan” paired-end read (e.g. no longer part of a “mated” intact pair). This distinguishes it from a single-end read that was never part of a pair and couldn’t be.
And for this, it seems like an important clue!
This situation is very curious, so I would still be interested in reviewing it.
I might have forgotten to mention it, but when running Trimmomatic when I get the mismatched r1/r2 error on Hisat2, Trimmomatic didn’t find any orphans, but generally ended up correcting the error somehow anyways. This is why I think it’s some type of bug. The reason I was using both Hisat2 and Bowtie2, is that each program would map reads the other didn’t. For example I run Bowtie2 on Very Sensitive Local and it may find 45mb of files, but then I run Hisat2 local and it may find another 45mb of files (which seems odd).
Oh sure, super odd! This is why I though it was interesting. I am wondering if there is something about the metadata (not actual data!) that we can smooth out.
I’d need to see an example to confirm then translate what is happening to the developers for a fix. ![]()
Like I said, only seemed to be happening on the usegalaxy.org site. The occasional bug I run into over there has me mostly doing things on the .eu site, causing me to waste a lot of time redoing things.
Did you see the failure that is in my current data list for the link I sent you?? I haven’t seen that error before, also on Hisat2. That same data on .eu runs perfectly.
Do you mean the example of the bowtie2 and hisat2 local finding a large number of similar reads??? I could come up with examples of that if you like.