I am currently doing a workflow on Galaxy to analyze RNAseq data. The steps include:
- Input dataset collection (always including 4 fastq.gz datasets)
-FastQCs on raw data
- trimming with Trim Galore!
- FastQCs on trimmed data.
- Concatenate on multiple datasets (the trimmed data)
- Mapping with BWA-MEM
- followed by counting with featureCounts
If I do this with only one dataset collection it works (obviously not the DESeq2 but everything beforehand)
If I include a second input dataset collection and add the previous mentioned steps it does not work. If i click on the run workflow button nothing happens.
While troubleshooting I realized that the problem probably is within featureCounts. If I have my second BWA-MEM tool connected to featurecounts nothing happens. If I delete the second featureCounts and only apply the tools until the mapping, on the second dataset collection input, the workflow is able to run again.
I want to use featureCounts to further on connect it in in my workflow to DESeq2.
Does someone know a solution for my problem?
I would really appreciate it.
Thanks a lot!
Here I have the workflows for better understanding. In the first picture you can see part of the workflow I want to archive. Just all of this 12 times to have one DESeq2 analysis with my 12 libraries.
In the second picture you can see the workflow that acutally creates an output. The workflow form the first one does not run.
In the first workflow screenshot, both
Featurecounts outputs are input to the same “Factor level” (facter level 1) of the
DESeq2 tool. Try sending each to distinct factor levels instead.
Hopefully you already discovered the problem, or this helps!
Thank you for your answer! Sadly that did not work ether.
It also says in The DESeq2 that you can use as an input multiple count files so I don’t see why it shouldn’t work.
Do you have any other suggestions?
Thanks a lot!
DEseq2 requires at least two factor levels and at least two count inputs per factor level, and how the factor/factor levels are labeled is very specific.
- Do you see any warnings associated with workflow run (User → Workflow Invocations)?
- Are you working at a public server (which?) or ??
- Is this reproducible at a UseGalaxy.* public server? This eliminates issues with the server itself.
- If you run the tools directly with the same data, does everything work as expected? Are you inputting extra content that is not yet included in the current version of the workflow? Testing this would eliminate data content issues from contributing to the odd/failed results so you can focus on the technical construction of the workflow.
- The training tutorials here have examples:
Most tutorials have workflow(s) associated. The RNA-seq “end-to-end” workflows are the closest match for what you are building. In the counts-to-genes tutorial’s workflow, the sample/factor/count information is input with a count matrix instead of individual count files plus a “cleaned up” factor label input with this workflow example.
The default labels currently being passed are probably part of the problem, and pre-parsing the data into count matrix + sampleinfo inputs standardizes that content. Organizing the data this way would also help to get rid of the current workflow’s redundancy. The tutorial includes a pre-made sample file for simplicity but you could also create one directly from the inputs with some creative data parsing, see here and here and maybe here.