How to link output ID's to original file names

Hello all,

With the risk of something similar having been addressed before: I have downloaded results from Galaxy (not created within a workflow). How can I (batch-) link these basically anonymous (“job on dataset xxx”) results to my original file names that were uploaded?

It would help a lot if the file/job-list under history could easily be exported to a tsv- or csv-file…!

Thank you for your consideration!

Kind regards, Corné

Hi @cornek

I don’t think this has been discussed here yet or not recently!

For an API approach, you would query the original Galaxy history directly instead of parsing a downloaded history archive. Then your mapping can be used to navigate the downloaded content.

Start with the history contents endpoint, then query each dataset for details such as name, tags, state, and related metadata. If you use #nametags for your samples, this will be easier when tracking through tools (outside of a workflow) to summarize by those tags.

The relevant pattern is:

GET /api/histories/{history_id}/contentsGET /api/histories/{history_id}/contents/datasets/{dataset_id}

This could be wrapped in a script to run against your history and write back into wherever you downloaded the tar.xvf history archive as a supplementary custom map.

The alternative I can this of is to parse the [ARCHIVE]/datasets_attrs.txt JSON directly with a custom script.

Let us know what you think or if this helps! :slight_smile:

Update: I’m still thinking about your idea about a UI navigation or downloadable tabular version of the JSON. Or, maybe a standalone utility to parse it that can be shared. More soon about this. :scientist:

Hi JennaJ,

Thank you for your reply! One of the things I like about Galaxy is that I (as a non-bioinformatician) can perform complicated bioinformatics analyses :slightly_smiling_face: To me, you’re proposed solution still sounds rather complicated :thinking: I’m thinking there should be an easier way to do this?

Hi @cornek

After thinking a bit more, I decided that while Python is possible, it would be interesting to do this all directly in Galaxy!

Galaxy includes a tool to parse JSON structured metadata. Knowing how to write the jq query isn’t really needed. Instead, knowing what you want is the important part.

I was able to get both ChatGPT and Claude to produce something usable with 2 then 3 iterations. These tools are very powerful for informatics! You should be able to use my example here to have an AI produce further customizations as wanted.

Tool

  • jq query and transform JSON documents

Process

  1. Download a history archive and uncompress it (either kind).
  2. Upload the datasets_attrs.txt file back up into a history.
  3. Run jq query to produce a txt version of the desired attribute content.
  4. Use the pencil icon to change to tabular format.
  5. Ready! to do anything else you might do with tabular data.

Example history

  • https://usegalaxy.org/u/jen-galaxyproject/h/using-tags-to-track-samples

    Please notice that I made some choices: I included different kinds of original tags which enabled pulling those out to better track data outputs across tools and groups of samples. I also decided which metadata to pull out. I choose those that seem the most obvious for me but won’t be for everyone. This is the primary reason why pre-parsing into a flat file is tricky: a massive flat file with missing content then redundant content isn’t as nice as a JSON with structured metadata.


Thoughts: I’m not sure if it is practical to expose/query this same data until after the action to generate a specific history export archive is completed. Using a workflow instead is the current supported model for sample tracking, but it is understandable why that can be restrictive. I am going to think about this more and will watch for your feedback.

Update 5/8: I created a ticket for this idea. Let’s see what the developers think and please feel free to add more context about use cases, or comments, on the ticket!

I hope this helps to get it started, and let us know what you think! :rocket: