My rule based uploader recipe for EBI ENA

The Rule based uploader tutorial by @jmchilton and @hxr provides an example of creating a list of dataset pairs. I find a slightly adapted version of the procedure very useful for loading collections of sequences from EBI’s European Nucleotide Archive into Galaxy. Since once can now save and share “recipes” from the uploader I though I’d share my recipe here.

My starting point is a project page in ENA, e.g. the one for PRJNA522942, which is data from this paper by Chen et al. On that page, choose to download the report in TSV format (i.e. click on the TSV - see the image below where that link is highlighted):

In the rule based uploader choose to upload data as a collection and choose to load the tabular data from a pasted table. Paste the TSV report from ENA into the form and click the build button.

In the rule based uploader, click the spanner icon (highlighted in the image below), paste the recipe from here, give your new collection a name and click Upload.

rule

Then sit back and wait. Galaxy will download the files from ENA and create a list of paired dataset collections with the name you gave it. Note that this recipe filters out non-paired end datasets - it will not work with single-ended data or ENA projects that contain a mix of single and paired end data.

3 Likes

btw I have noticed that due to network problems on the EBI ENA side (variable network speeds and connection problems when doing automated retrieval from ENA are common) this procedure does not work for all ENA projects. Perhaps some additional work on the Galaxy uploader can help fix this.

Super cool, thanks for sharing!