Taxonomy workflows: reads to taxa

Hi Jemma,

I have some samples being sequenced. These are for soil fungi eDNA metarcoding.

On NCBI I have found and downloaded a FASTQ file (2MB) of the type I will be using.

Sequences will be about 300bp long.

I would like to use this to set up the following bioinformatics workflow so I am ready to go when I get my FASTQ files.

  1. read in Illumina MiSeq FASTQ files (the files will already be demultiplexed)
  2. trim sequences (if needed)
  3. filter them e.g. reject low quality score) (if necessary).
  4. Cluster in OTUs using appropriate parameters
  5. Match OTUs using appropriate parameters to a suitable UK fungi sequence database
  6. Output the taxa list to a spreadsheet

I would like to generate QC charts at each stage.

Might be some more steps but that would be a good start.

This is a pretty standard workflow and there must be a workflow like this already in Galaxy for me to copy/adapt.

How would I find it ?

How do I start building and testing a workflow ?

Adam

Hello @Adam_Hillier

Glad to learn you are proceeding with your project! :scientist:

There are a few primary places to source workflows for Galaxy.

IWC – Production quality workflows

These are curated, so if you can find what you need here, that would be preferred! Each has been optimized for large batch stream processing. This catalog is newer and growing and has stricter community standards.

Galaxy Hub – Public Workflows

This is the “meta” search. You’ll find workflows from the GTN trainings, WorkflowHub, and the Public Workflows available from the communities at the UseGalaxy* servers.



A workflow from either can be customized further, too. I would be pretty common to break out an analysis like yours into two or three distinct module workflows. Then scientists could run them separately or nest as subworkflows into a single master workflow that does everything with a bit more customization (reference data preparation, intermediate file offloading, workflow reports).

I didn’t find an IWC workflow for eDNA specifically and one of the training quality workflows from the GTN is probably too simple for your needs (no clustering). The other, using Obitools, will work best at one of the “Available at these Galaxies” servers for now – UseGalaxy.eu would be a good choice. :slight_smile:

Hope this helps to get things started! :slight_smile: