Creating workflows with Galaxy API (BioBlend)


#1

I am new to Galaxy and I am trying to learn how to use the Galaxy API with the BioBlend package.

When describing the Galaxy API, Galaxy mentions that it is useful to use the API and tools such a BioBlend in the following cases:

  • Running a workflow against multiple datasets (which can be done with the web interface, but is tedious)

  • When the analysis involves complex control, such as looping and branching

I was therefore hoping that I would be able to programmatically create workflows with BioBlend, not only invoke them. however, the documentation for BioBlend doesn’t seem to indicate that I can make workflows. Am I correct in thinking that I cannot create workflows? Is this something that Parsec might be able to do?

An example of I was hoping to do. I want to automate RNAseq differential expression starting from a single matrix produced with featureCounts. I have 3 groups I need to compare (so 3 inputs for edgeR). However, in the Galaxy GUI, I have been unable to seperate my matrix count into 3 noodles going into edgeR. In other words, if I want edgeR to have 3 input groups, I need to create 3 separate matrix counts. See image below. I was hoping to be able to programmatically create a workflow with BioBlend that would help me around this problem.

Any help would be deeply appreciated.


#2

Yes, you can create workflows via the API, but this is probably not going to help you with this problem.
To create a workflow you would need to create json file that specifies the workflow steps and input connections that follows the Galaxy workflow format, but that isn’t fully described / specified at this point. This is essentially what the workflow editor does for you, or any other API client (bioblend, parsec (which uses bioblend), blend4j …).
There’s ongoing work to simplify this using format 2 workflows (https://github.com/jmchilton/gxformat2), but again I don’t think this has fully stabilized yet.

So the attached screenshot looks fine, I assume this works for you but you would like to generalize to an arbitrary number of comparisons that is driven y the input data ?

We’ve been working on enabling this using group tags, we’re not entirely there yet, but a big step towards this is using group tags and then inferring comparisons using the group tags. That last part I was planning to address once https://github.com/galaxyproject/tools-iuc/pull/2167 is merged. Doing the same for edgeR would be trivial.


#3

Thanks for your answer.

Would it be possible for you to point me in the direction of some examples of bioblend usages? For now I have only used the following links but I was wondering if I can find real life examples (of people also doing RNAseq for example). Also, I was reading about parsec being a wrapper of bioblend but does it actually offer more functionality than bioblend?

https://bioblend.readthedocs.io/en/latest/api_docs/galaxy/all.html#objects-api
https://bioblend.readthedocs.io/en/latest/api_docs/galaxy/docs.html#export-or-import-a-workflow

Regarding my attached screenshot, yes that is indeed what I would like to do. For now, I am forced to use separate matrices for each egdeR group but it sounds as if your group tag features may be the solution. But this is not yet implemented in Galaxy, is that correct?


#4

bioblend consumes the Galaxy API, so it’ll only ever be able to do what the API offers. At this moment bioblend offers a significant subset of API calls, but not all of them. Parsec in turn uses bioblend to provide functionality. Think of it as reusable recipes making use of bioblend.

https://bioblend.readthedocs.io/en/latest/api_docs/galaxy/docs.html#export-or-import-a-workflow is the right link for importing Workflows into galaxy. You could in principle script your analysis to create a workflows (by programmatically writing JSON workflows) for whatever scenario you want to enable, but that’ll be a lot of work and won’t be easy to use for users.

Galaxy can already do everything that is necessary for group tags, the tool just needs to be merged and deployed.

Right, that is the way to go IMO. This nicely generalizes to any datatype (multisample tables only work for tabular data). I’m sorry that we haven’t made more progress there, but I’ll try to write this up in more details soon.


#5

It seems it now is possible to use a single matrix as input - so need more need to have one matrix per sample.

However, I am having trouble using a single factor file for a multifactorial analysis (but I have posted a separate question about this).


#6

Hi @m93, yes creating a count matrix is the way to go here (the group tags functionality when that’s added to edgeR would be the other way), you can use the Column Join on Collection tool to easily create a matrix from multiple count collections, see the end of the tutorial here: https://galaxyproject.github.io/training-material/topics/transcriptomics/tutorials/rna-seq-reads-to-counts/tutorial.html