I am using the plotCoverage tool to analyze some bam files. I am using the save raw counts option so that I can get a table of per-position coverage. When I run this on single-end read alignments, the output table gives unique names to each column, which I can trace back to the original bam file. When I run the tool on paired-end alignments, the output columns will simply say “forward” or “reverse”, which is impossible to trace back. Does anyone with experience using this tool have a helpful suggestion about how I can get around this issue?
Welcome, @ahentges
Your BAMs are in a collection folder, correct?
You can adjust how those BAMs are labeled, and tools that consume the collection will inherit those labels. You can do this by adjusting how the fastq reads that were input to the BAMs were labeled (for automatic sample naming) or by adjusting the BAMs after themselves (now, I think that will work with this tool, but I haven’t tried it!).
The labels for the datasets inside of a collection folder are termed element identifiers in Galaxy, and are commonly given a name like sample_forward and sample_reverse. But you can use whatever naming you prefer. By default, two ends of a pair will just be forward and reverse, and the collection folder they are inside of will have the sample name.
You can adjust those labels, and there are a few ways to do this.
- Extract element identifiers – pulls out the current labels
- Relabel identifiers – full direct control over the labels. You provide a mapping file. The first column of that mapping is usually the result from the extract tool above. The second column is your custom label to replace with and use instead.
- Flatten collection – has an option to create compound labels from existing labels! You can then recreate the nested collection structure with the new labels, or process flattened, and merge tool results after (if the target tool can be used that way)
Tutorials
- abstracted example → Hands-on: Using dataset collections / Using dataset collections / Using Galaxy and Managing your Data (#relabel-identifiers)
- all multi-sample methods → Using Galaxy and Managing your Data / Tutorial List
If you need more help, please create a very small testing history with just the data that will end up with two or three paired-end BAMs (the smallest you have), and then run the tool on those. You can try relabeling the fastqs or the BAMs – I’m not sure which is needed yet. If you cannot get that to work, you can share the whole thing back and we’ll try to help with the logic. Once worked out, you can put all of this into a workflow. “Copies” of data in different collection shapes do not consume extra quota space, so you can manipulate as needed to organize data in ways that different tools can understand without exploding your data usage.
How to share that smaller testing history is in the banner at this forum, and also here → How to get faster help with your question
Thank you for the recommendations, they helped a lot.