Help using PICRUSt2 pipeline?

Hello,

I am trying to run the PICRUSt2 pipeline tool, but I am unable to find documentation on how to use it. I originally ran the DADA2 pipeline to receive ASV sequence tables for 16S data. I converted the ASV sequence table to fasta format to use in the PICRUSt2 pipeline as “study sequences” input and I used my original removeBimera ASV sequence table as the “sequence abundance table” input. I am receiving error messages with no additional information. I think it might be that lines or headers do not match up, but I’m unsure how to fix it or what I need to do to fix it.

Thanks

Hello @sesamechicken

For the error here

Reviewing the error and input data would allow us to help to troubleshoot. You can paste back a generated shared history link to get this started!

Then, for this part of your question

The Help section of the tool form has a simplified guide and a link out to the author’s wiki site with more. The tools will work in Galaxy the same way!

PICRUSt2 Full pipeline (Galaxy Version 2.5.3+galaxy0)

Help

What it does

PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a tool for predicting functional abundances based only on marker gene sequences.

Read more about the tool: Home · picrust/picrust2 Wiki · GitHub

PICRUSt2 full pipeline

Run sequence placement with EPA-NG and GAPPA to place study sequences (i.e. OTUs and ASVs) into a reference tree. Then runs hidden-state prediction with the castor R package to predict genome for each study sequence. Metagenome profiles are then generated, which can be optionally stratified by the contributing sequence. Finally, pathway abundances are predicted based on metagenome profiles. By default, output files include predictions for Enzyme Commission (EC) numbers, KEGG Orthologs (KOs), and MetaCyc pathway abundances. However, the tool enables users to use custom reference and trait tables to customize analyses.

Note

The standard pipeline will generate metagenome predictions for 16S rRNA gene data.

Input

  1. A FASTA of amplicon sequences variants (ASVs; i.e. your representative sequences, not your raw reads)
  2. A BIOM table of the abundance of each ASV across each sample.

Output

  1. Output tree with placed study sequences.
  2. Metagenome Predictions
  3. Pathway level predictions

Please give this a review and let us know if you are able to resolve this on your own, or if you would like some help with a history review! :slight_smile:

Related Q&A → Search results for 'picrust2 order:latest' - Galaxy Community Help

It didn’t really give me any specifics. Just said: “An error occurred while running the tool toolshed.g2.bx.psu.edu/repos/iuc/picrust2_pipeline/picrust2_pipeline/2.5.3+galaxy0.”

Hi @sesamechicken

Yes, but the message is from the Galaxy wrapper and means that the tool could not complete processing at a lower level. This usually means some issue with the inputs or parameter choices, so you’ll need to review closer against the other content I shared. Or we can try to help here, but without seeing the details those will only be guesses!

The search I included (also above!) at this forum has some examples of prior troubleshooting. This includes a simple example in a shared history. I just started up another run using a slightly different biom input in that same history, too! Maybe helps? See the other topics for full links and walk-throughs.

Hope this works out! :slight_smile:

I have figured out what the issue is, but I am unsure how to fix it in usegalaxy. PICRUSt2 requires 2 files in a specific format:

  1. A .fasta file with a unique ASV ID and its corresponding sequence like:

>ASV1

TACGAAGGGGG…

>ASV2

TACGTAGGGCG…

  1. And a tabular or biome file with just the ASV ID, sample names, and read counts like:

ASV ID Sample 1 Sample 2

ASV 1 6829 3304

Currently I just have a row for the sequences, no unique ASV IDs row or an “ASV ID” header.

So I need to be able to do this in galaxy:

  1. Add a column to my tabular file that iterates row number (ASV1, ASV2, etc)
  2. Add a header to this column titled “ASV ID”
  3. Cut the column containing the sequences. I now have a tabular file I can use.
  4. Then I need to cut the columns containing sample names and their associated read counts and convert only the sample ID and sequence columns from tabular to fasta format. I now have a fasta file I can use.

So how do I do this in galaxy? More specifically, How do I do step 1 of adding iterative row numbers?

Hope this makes sense, thanks!