Hi @santatra
If you are concerned about a tool not being able to match up sample and contig identifiers between files, you can adjust the identifiers in one or both files. With computational tools, shorter is usually better anyway! Put all of the mappings into a tabular file you can reference to keep track of this.
Or, sometimes you can use a collection and the collection identifiers hold the sample name, and tools use that instead.
You can also see our workflows for the process. To use directly, or to review how the sample/data labels are handled.
More about each below!
Text Manipulation
Modify the data files, but same back the original mappings into a tabular file. Example:
SampleID
ShorterID
LongerID in files from step N
LongerID in files from step Z
You could also add in any of the encoded results in the file names to your tabular master sample list, like:
SampleID
ShorterID
LongerIDN (used in files from step N)
ResultN (presence/absence notation)
LongerIDZ (used in files from step Z)
ResultZ (if you want to track anything from these too)
Any text data manipulations you want are likely possible. We have tutorials here that go through some examples. Or, search the tool panel with keywords – the tools will usually contain the common command-line utility name (if you are used to doing it that way). If you will have batches of data, putting your custom manipulations into a mini-workflow will make this go quicker next time, or to include with your reproducibility methods if this is for a publication.
Collections
If you are using dataset collections, the “name” given to data inside the sample files sometimes doesn’t matter. Instead, the collection identifier can hold the SampleID and that is enough. But this depends on the tool. Which tool do you plan to use next?
Workflows
Are you following a tutorial or are you currently using (or plan to use) an
IWC Workflow? Do you want to? I wasn’t sure if you were looking for a workflow for these steps or for the data manipulations? Arbitrary data manipulations are unlikely to be in a stand-alone community workflow (overly custom!) – instead, those are included as intermediate steps for a specific purpose to help tools to chain together – or, not needed at all (collections are enough).
For the full pathway, this is one suggestion and starts from reads to produce data similar to what you have generated. Could be a useful comparison!
IWC Metagenomics Taxonomic and Antibiotic Resistance Gene (ARG) Profiling | IWC- Filter Microbiome Intergalactic Workflow Commission
- Filter Metagenomics Intergalactic Workflow Commission
For training, you could also review here at the
Galaxy Training Network (GTN).
GTN Hands-on: Antibiotic resistance detection / Antibiotic resistance detection / Microbiome- Microbiome / Tutorial List
- Learning Pathway: Detection of AMR genes in bacterial genomes
I hope this helps to frame what we have! Not all tools are included in a tutorial but can be found in the tool panel with the simple and advanced filters. Most will have help and references on the tool form (scroll down!) and you can ask here with any questions. Hope this helps! Follow up questions are welcome. ![]()