I am trying to run the PICRUSt2 pipeline tool, but I am unable to find documentation on how to use it. I originally ran the DADA2 pipeline to receive ASV sequence tables for 16S data. I converted the ASV sequence table to fasta format to use in the PICRUSt2 pipeline as “study sequences” input and I used my original removeBimera ASV sequence table as the “sequence abundance table” input. I am receiving error messages with no additional information. I think it might be that lines or headers do not match up, but I’m unsure how to fix it or what I need to do to fix it.
The Help section of the tool form has a simplified guide and a link out to the author’s wiki site with more. The tools will work in Galaxy the same way!
PICRUSt2 Full pipeline (Galaxy Version 2.5.3+galaxy0)
Help
What it does
PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a tool for predicting functional abundances based only on marker gene sequences.
Run sequence placement with EPA-NG and GAPPA to place study sequences (i.e. OTUs and ASVs) into a reference tree. Then runs hidden-state prediction with the castor R package to predict genome for each study sequence. Metagenome profiles are then generated, which can be optionally stratified by the contributing sequence. Finally, pathway abundances are predicted based on metagenome profiles. By default, output files include predictions for Enzyme Commission (EC) numbers, KEGG Orthologs (KOs), and MetaCyc pathway abundances. However, the tool enables users to use custom reference and trait tables to customize analyses.
Note
The standard pipeline will generate metagenome predictions for 16S rRNA gene data.
Input
A FASTA of amplicon sequences variants (ASVs; i.e. your representative sequences, not your raw reads)
A BIOM table of the abundance of each ASV across each sample.
Output
Output tree with placed study sequences.
Metagenome Predictions
Pathway level predictions
Please give this a review and let us know if you are able to resolve this on your own, or if you would like some help with a history review!
It didn’t really give me any specifics. Just said: “An error occurred while running the tool toolshed.g2.bx.psu.edu/repos/iuc/picrust2_pipeline/picrust2_pipeline/2.5.3+galaxy0.”
Yes, but the message is from the Galaxy wrapper and means that the tool could not complete processing at a lower level. This usually means some issue with the inputs or parameter choices, so you’ll need to review closer against the other content I shared. Or we can try to help here, but without seeing the details those will only be guesses!
The search I included (also above!) at this forum has some examples of prior troubleshooting. This includes a simple example in a shared history. I just started up another run using a slightly different biom input in that same history, too! Maybe helps? See the other topics for full links and walk-throughs.
I have figured out what the issue is, but I am unsure how to fix it in usegalaxy. PICRUSt2 requires 2 files in a specific format:
A .fasta file with a unique ASV ID and its corresponding sequence like:
>ASV1
TACGAAGGGGG…
>ASV2
TACGTAGGGCG…
And a tabular or biome file with just the ASV ID, sample names, and read counts like:
ASV ID Sample 1 Sample 2
ASV 1 6829 3304
Currently I just have a row for the sequences, no unique ASV IDs row or an “ASV ID” header.
So I need to be able to do this in galaxy:
Add a column to my tabular file that iterates row number (ASV1, ASV2, etc)
Add a header to this column titled “ASV ID”
Cut the column containing the sequences. I now have a tabular file I can use.
Then I need to cut the columns containing sample names and their associated read counts and convert only the sample ID and sequence columns from tabular to fasta format. I now have a fasta file I can use.
So how do I do this in galaxy? More specifically, How do I do step 1 of adding iterative row numbers?