I am having issues with the Kraken2–>Bracken step of my workflow and wondering if anyone could provide some assistance? I found a similar topic in the forum circa 2020, but no one answered his request so I’m hoping to have better luck. The system claims the kraken2 report is empty, but kraken2 completed without errors. Here’s a SS of the error:
Have you tried a rerun yet? You can go to your history and use the rerun-icon to bring the form up again. There will be an extra toggle above the submit button to also rerun the downstream jobs. This will start up the workflow again. One rerun is usually a good idea in case there is some small technical problem with the cluster (can’t find a file, that sort of thing).
Then it it fails again, go back to the upstream steps. I am wondering if the Kraken2 output is actually empty. If you go to your history and look at it, what do you find?
Click on the eye-icon – do you see any content? Maybe just headers but no data lines?
Click on the i-con – check the inputs here. If you click on the inputs, do those have content that you expect? Were the parameters set how you expect?
On that same view, then scroll down to the output logs. Are there any messages in those reporting what may have gone wrong?
You can screenshot any of that for feedback, or share the history itself.
The green/red result from the dataset color is now the tool technically worked. Did it fail or not. There are lots of things that can go scientifically “wrong”, that are trapped by the tool itself or just in the output itself, that will not cause a job failure for technical reasons. Reviewing the results upstream are how to diagnose the problem. The job logs are the place where most information will be found (i-icon).
Yes, I have tried rerunning the workflow several of times and I don’t understand what needs to be modified for it to work. I am a very independent person so for me to ask for help I am out of ideas
I looked at my history. This is a screenshot from Taxonomic Classification data:
It appears column 1 is empty/ -nan? Downloaded one of the Kraken outputs and the .txt file is indeed blank.
Again, any guidance you and the community provide is greatly appreciated. I’m not sure if I need to tweak the settings for Kraken2 and Bracken or if there’s something more nefarious at play. Regardless, I have over 600 shotgun sequences to analyze and I need Galaxy’s collection functionality so I can set it and -eventually- forget it until the results are complete. Easier said than done…
Bonus question: Are there any tool(s) for combining Bracken outputs into one report which can be integrated into the workflow? I found Combine Kraken but to my knowledge Bracken reports are formatted differently.
So somehow it doesn’t even recognize any sequences in its input although that input, produced by bedtools, looks just fine and can be processed with other tools.
Rerunning kraken2 on this same input doesn’t help, but results in the same empty output again.
Your shared workflow just works fine for me!
I tried to run the kraken2 step with several different databases, including the ones that you used previously in your shared history, and it’s working all the time.
Now, however, I just realized that the data in your shared history is not living in Galaxy Europe’s default storage, but in user storage that you configured?
Given that this issue is reproducible for you, but not for me unless I’m trying to use your fastq dataset, I’m tempted to assume, the storage location is the key here.
Can you provide a bit more detail on what type of storage you’ve configured?
It’s AWS S3 bucket. I noticed instead of .html, .txt, .tsv, etc. files Galaxy sends .dat to the S3 bucket which is another frustrating issue I need to resolve but I felt the empty Kraken2 files was most pressing. What do you think @wm75 ?
My goal is to generate species level taxonomic classification data for gut bacteria. I have ~ 600 reads. However, I don’t know how much storage space will be utilized in the process which is why I switched from .org to .eu so I’d have the ability to store and export files to the AWS. I’m relatively new to AWS though so if there are resources in the forums that you’d recommend please point me in that direction. Cheers.
We’ve enabled AWS-support quite recently and so far had no big issues with it, but you never know. I need to run a few more tests over the weekend, but one thing you could try on your end is to run the workflow in a history that uses default storage and report back here whether that works.
I also have a few suggestions to simplify your workflow a bit, but that can wait until we’ve solved the current issue.
Sounds good. I’ll run a small batch with default storage to see if that narrows down the issue. Thanks so much for your time and assistance. I’m looking forward to hearing those suggestions.
Hi @wm75 so I ran a few reads that (a) were not sourced from my S3 bucket and (b) were not exported to my S3 bucket and of course it worked. This however creates another layer of issues for me because I need to be able to use my S3 bucket for storage and export due to the large volume of files my research generates. Otherwise, I’m back to small batches.
I’m almost 100% sure that you are not doing anything wrong here, but that this is some unfortunate combination between how Galaxy handles the remote storage and how the kraken2 code tries to access its input. Just think of all the other steps in your workflow that are functioning with your aws-stored data just fine.
I will bring this up as a priority issue in our team meeting tomorrow and hopefully we can fix it fast.
Thanks for reporting and for your patience for now, and I will post updates here.
Luckily this was just a configuration issue and easy to fix:
on Galaxy Europe we had simply forgotten to give tools running in containers access to remote storage. Out of the tools in your workflow only kraken2 runs inside a singularity container and that’s why this was the step at which the issue surfaced.
Things are configured properly now and you should be able to switch back to using remote storage.
Now coming back to my workflow improvement suggestions that I had promised:
Currently, you have set the Bowtie2 option “Reorder output to reflect order of the input file” to Yes, but the next tool in the workflow requires a position-sorted BAM, which means that Galaxy, under the hood, has to convert the entire output again to the format that bowtie2 would have generated by default.
If you haven’t seen it: Bowtie2 has the option to “Write unaligned reads (in fastq format) to separate file(s)”, which can, essentially, replace both your downstream filtering and bedtools conversion step. Plus, this will (contrary to what it says) produce fastq.gz, i.e. compressed, output so save disk space compared to bedtools.
If, for some reason you really don’t like the additional bowtie2 output, I would recommend using “Samtools fastx” over bedtools because this tool supports fastq.gz as output format.
You asked about merging bracken reports. An exact answer depends a bit on what you imagine the merge to look like exactly, but one way could be to use a tool like " Column join on multiple datasets".
You were also asking about remote storage output organisation: remote storage is not meant to replace a data export. In order for Galaxy to be able to work with remotely stored files it needs to be up to Galaxy how to organize and name them and this is not going to match what’s best for browsing those files.
Instead, you would download the results files you want, or entire collections, to your local system via the Galaxy user interface. Another option would be to export your entire workflow invocation (“export” tab under “Workflow invocations”). Such an export would include all datasets that are marked as workflow outputs (the checkmark next to the eye icon on workflow steps, so it’s worth giving these some thought).