Joiner output for metaphlann2

Hello!
I would like to do a metagenomics analysis with the " MetaPhlAn2 to profile the composition of microbial communities " tool from Galaxy and using as input file, one that is the output of " FASTQ joiner on paired end reads " tool. Because I would like to use the forward and reverse fastq sequences from the same sample at the same time.

However, I always get an error and failure when I try this. It seems to work only with the forward or the reverse fastq sequence, but no when they´re together.
Someone could help me? Is there specific parameters that I should have in mind?
Thanks!

1 Like

I think you should always post here what the error exactly says. Now it is hard to help you.

But! Check the output format of FASTQ joiner, this is fastqsanger. Change this to fastq by clicking on that pencil icon and then run Metaphlan2. Think this will do the trick.

Btw, if you have amplicon data also take a look at FLASH.

1 Like

Thanks for the suggestion, but it did not work :sob:
Now, I include the error here. Sorry, I didn´t see before the window where the error is explained (I am new in Galaxy…:S)
the error says that:
Original error: usage: metaphlan2.py --input_type
{fastq,fasta,multifasta,multifastq,bowtie2out,sam}*
[–mpa_pkl MPA_PKL] [–bowtie2db METAPHLAN_BOWTIE2_DB]*
[–bt2_ps BowTie2 presets] [–bowtie2_exe BOWTIE2_E*

2nd error (after changing with the pencil the format of file to fastq): Error: reads file does not look like a FASTQ file
Error: Encountered exception: ‘Unidentified exception’
Command: /usr/local/tools/_conda/envs/mulled-v1-c5867e29ea0fba532ec8dc4a557d8798445dccb6ecf21f67e09143751a79b65d/bin/bowtie2-align-s --wrapper basic-

1 Like

I expected this to be your first error. And my solution should solve it, at least it did for me when I tested it.

Is it possible to try it again? Did you changed the datatype of the right file? So the output of FASTQ joiner. Maybe you can test it also on the public galaxy server to check if it is not a local problem. When I replied on your question I used https://usegalaxy.eu/

EDIT:

I just read your error better.
Error: reads file does not look like a FASTQ file
Try to open and examine the file. Maybe it has to do with the FASTQ Header Style option

1 Like

Hello!
Thanks for all your answers :slight_smile:
I tried again several times and finally MetaPhlAn2 worked well with the file produced by FASTQ joiner. I didn´t change anything…except that I uncompressed the file. Maybe that was the main error.

However now the result gives me as output: 100% unclassified. I don´t understand why…Any idea?
Thanks again!

With very limited to no knowledge about metaphlan2 but still want to help you I would say that there are just no known markers in your input. Why they are not there I don’t know. Because this is a specific question about the analyses and not galaxy itself you can maybe get an answer at biostars.

Here the website of the tool:
https://huttenhower.sph.harvard.edu/metaphlan

Good luck

1 Like

Thank you very much! :wink:

1 Like

I have a last question, about the input used for FASTQ joiner tool.
The input fastq files must be a fastqsanger type or similar, no? But what about fastq files from Illumina? I would like to join forward and reverse fastq files from Illumina. So, my question is if I am selecting the correct tool for doing that.
Maybe there is another tool that I should use and because of that I am having so many errors…

1 Like

Hi @SumTot

@gbbio May have more ideas (is a pro at troubleshooting!) … but I would suggest posting back the first two fastq reads from both the forward and the reverse inputs to help with that. Quote the content to preserve the format. Also note the currently assigned datatype (copy/paste to ensure it is exact).
The tool is picky about the input sequence and quality score lines. We can help with any other fix-ups or the appropriate parameters that might work with the data “as-is”.

Illumina reads come in a few different “flavours” – it depends on the version. The end goal will be to get to the point where all is in a standardized fastqsanger datatype variant. And to make sure the assigned datatype/compression is a match for the actual dataset content. Most tools require fastq data to be in a fastqsanger variant format – mostly due to how the quality scores will be interpreted. If the quality score scaling is not what is expected, tools can error in odd ways, or not output data at all, but that is usually easily resolved.

One important part about content is that both inputs contain the same base “read” names. If there are unpaired reads in either, that will cause problems. QA steps can remove one end of a pair. Some (example: Trimmomatic) will split those out – four outputs – 2 for those that remain paired (forward + reverse), 1 for unpaired forward, and 1 for unpaired reverse. Tools in the group Seqtk can also be used to filter out any unpaired reads – some tools are single manipulation functions, and one (tool:Seqtk seq) can apply multiple manipulations all at the same time.

Also, be sure that you are using the most current version of the tool. At https://usegalaxy.org, this will be: FASTQ joiner on paired end reads (Galaxy Version 2.0.1.1+galaxy0)

The first few FAQs here may also help. Review if you want to – all of it is good reference info that helps to resolve the most common input problems. https://galaxyproject.org/support/#getting-inputs-right

Thanks!!

1 Like

If your method by joining the paired reads and execute methaplhan2 is the right method I don’t know. But Illumina data should just work. As far as I know fastq and fastqsanger is basically the same format. But the datatype “fastqsanger” is a galaxy way to make sure it got the ASCII_BASE 33 quality scores. Only it is possible that the galaxy tool or in other words the wrapper is created to give the output the galaxy datatype fastqsanger. But it is just a FASTQ file.

So now you will think, why does it not work then if fastq and fastqsanger is the same format. This is because metaphlan2 has a parameter --input_type and will get the value of the galaxy datatype of the input file.

If you are interested you can see it here: https://github.com/galaxyproject/tools-iuc/blob/master/tools/metaphlan2/metaphlan2.xml

In summary Illumina data just works but after FASTQ joiner you need to change the galaxy datatype to fastq because the metaphlan2 galaxy wrapper is created that way. It has nothing to do with your data itself but with galaxy.

If, but I highly doubt it you got ASCII_BASE 64 quality score you could use something like https://toolshed.g2.bx.psu.edu/repos/devteam/fastq_groomer

Here a page with the scores:
https://drive5.com/usearch/manual/quality_score.html

2 Likes