No SRR IDs in sistr_cmd output

handyer3 · August 21, 2019, 4:32pm

Hello!

I am a relatively new Galaxy user operating through GalaxyTrakr and I’ve pulled a bunch of our Salmonella isolate reads from SRA as collections and have done a number of analyses with them successfully within Galaxy.

However when I try and serotype them with sistr_cmd, it generates an output with all the appropriate information except for the corresponding SRR IDs. The only identifying information in the output I can see to differentiate isolates are the Galaxy History numbers corresponding to each contig assembly from Shovill (e.g., “Shovill on data 877 and data 876: Contigs”).

If I were only dealing with a couple isolates, I could go through my History and try and match everything up to the original IDs, but I am working with 400+. Does anyone know either how to get sistr_cmd to include identifying information or if there is anyway to download the History metadata so I could write a script to match History numbers with SRR IDs?

Thank you all!

jennaj · August 21, 2019, 6:20pm

Hello @handyer3

Some public Galaxy servers have tool/domain-specific analysis tutorials and other help not available other places. So, check for those. You could also ask that server’s support team for help, they may have a strategy to share. Contact information is sometimes on the home page of the server and sometimes here: https://galaxyproject.org/use/

The Galaxy wrapped tool was last updated in 6/2007 and does not appear to have an option to relabel outputs based on a tool form option (input tabular mapping dataset, etc). That type of option would be the one way to address reformatting the output the way you need it. The tool author could be contacted to ask for an update. Do this through the ToolShed when logged in (create an account if needed) or open an issue ticket at the development repository. This is the ToolShed repository: https://toolshed.g2.bx.psu.edu/view/nml/sistr_cmd/5c8ff92e38a9

Renaming the input datasets is another choice but there isn’t a good way to do that in batch for 400+ individual datasets. You could try putting these into a Dataset Collection and then use the tool Relabel List Identifiers from contents of a file. The option “Show Structure” from the history menu generates a view that includes all the dataset names. It cannot be directly downloaded but you could copy/paste that out into a file and work with it line-command to create a mapping file.

How to use collection tools:

Basics https://galaxyproject.org/learn/ > * Dataset collections - modern studies usually include many samples. Collection are designed to simplify complex, multi-sample analyses as shown in this tutorial.
Advanced: https://training.galaxyproject.org/ > https://training.galaxyproject.org/training-material/topics/galaxy-data-manipulation/

Hope that helps.