Bacteria community and Antibiotic Resistance Gene

Hi @santatra

If you are concerned about a tool not being able to match up sample and contig identifiers between files, you can adjust the identifiers in one or both files. With computational tools, shorter is usually better anyway! Put all of the mappings into a tabular file you can reference to keep track of this.

Or, sometimes you can use a collection and the collection identifiers hold the sample name, and tools use that instead.

You can also see our workflows for the process. To use directly, or to review how the sample/data labels are handled.

More about each below!

Text Manipulation

Modify the data files, but same back the original mappings into a tabular file. Example:

SampleID
ShorterID
LongerID in files from step N
LongerID in files from step Z

You could also add in any of the encoded results in the file names to your tabular master sample list, like:

SampleID
ShorterID
LongerIDN (used in files from step N)
ResultN (presence/absence notation)
LongerIDZ (used in files from step Z)
ResultZ (if you want to track anything from these too)

Any text data manipulations you want are likely possible. We have tutorials here that go through some examples. Or, search the tool panel with keywords – the tools will usually contain the common command-line utility name (if you are used to doing it that way). If you will have batches of data, putting your custom manipulations into a mini-workflow will make this go quicker next time, or to include with your reproducibility methods if this is for a publication.

Collections

If you are using dataset collections, the “name” given to data inside the sample files sometimes doesn’t matter. Instead, the collection identifier can hold the SampleID and that is enough. But this depends on the tool. Which tool do you plan to use next?

Workflows

Are you following a tutorial or are you currently using (or plan to use) an :gear: IWC Workflow? Do you want to? I wasn’t sure if you were looking for a workflow for these steps or for the data manipulations? Arbitrary data manipulations are unlikely to be in a stand-alone community workflow (overly custom!) – instead, those are included as intermediate steps for a specific purpose to help tools to chain together – or, not needed at all (collections are enough).

For the full pathway, this is one suggestion and starts from reads to produce data similar to what you have generated. Could be a useful comparison!

For training, you could also review here at the :graduation_cap: Galaxy Training Network (GTN).



I hope this helps to frame what we have! Not all tools are included in a tutorial but can be found in the tool panel with the simple and advanced filters. Most will have help and references on the tool form (scroll down!) and you can ask here with any questions. Hope this helps! Follow up questions are welcome. :slight_smile: