Hi, I seem to run into issues running MetaPhlan on cleaned (fastp) Nanopore fastq files of metagenomes.
I have already tried different MetaPhlan tool versions, lowering the MapQ teshold, different database, different files, to no avail. The tool will either crash and give an error “Possible unintended interpolation of @4 in string at” or it will run but bin all the reads as unclassified.
What am I missing? Why isn’t it working? What else can I try to debug it?
I haven’t had issues running Kraken2 for the same files either…
As far as I knew, and from what I can find in recent topics when I search, the problem you describe is because MetaPhlan was originally designed to process much shorter sequences and it hasn’t been updated for Nanopore long reads yet. Please note that this is how the base tool itself works. Meaning, this is how it would work anywhere, not just in Galaxy, when given query reads that are too long for it to handle well.
You can also review. This is a search I ran at the author’s support forum
It seems like the authors are aware that this is an enhancement people would find useful, but I didn’t find any more about future plans. Maybe I missed that topic, and you could ask them about the status or plans. If the tool is updated, we would then incorporate that into Galaxy.
I think this explains your results: the first case was probably a technical error when parsing a longer string of quality scores (guess!), and the other case is just no hits (maybe a flavour of “non-specific hits” per query read? or gaps that were too difficult to process and the hit dropped? more guesses!). In either case: the tool was overwhelmed and didn’t produce useable results.
Kraken2 will work, so I’m glad you found that option already!
Well, that makes sense; thank you for your detailed answer…
Somehow I convinced myself that MetaPhlan worked with long read. Will go back to the drawing board!