Welcome @XinZhouUmea
Adjusting the identifiers inside of a BAM is tricky and there isn’t an automated way to do this that I know if. The risk of having slightly mixed up genome assemblies is also a consideration.
As you know, right now, your data has Ensembl identifiers for a rat assembly, and CustomProDB is expecting a UCSC/NCBI identifier format. We don’t know if the assembly versions are exactly the same (in particular, the haplotype/alt contigs can differ across base assemblies!). If you don’t want to create your own rdata files for the genome you are currently using, then remapping the data against one of the genomes natively indexed is probably needed.
We have a guide here that uses human as an example but the same can apply to any genome. In short, there can be multiple assemblies and using the same one throughout all steps is really important for coordinate based data. → Reference genomes at public Galaxy servers: GRCh38/hg38 example
All that said, if you can confirm that the two assemblies are identical except for the identifiers, and want to try to modify the data anyway (not recommended!), the process would go something like this:
- Replace column by values which are defined in a convert file – see the bottom of the tool form for a link to a public repository where IDs are mapped across assemblies.
- Run BAM-to-SAM convert BAM to SAM to isolate the data hit lines from the header.
- Run Replace column to modify the data hit lines (in tabular format) to the target assembly.
- Run a small mapping against the target assembly you are converting to (to generate a header).
- Put it all together with a tool like Samtools reheader copy SAM/BAM header between datasets
- Convert back to BAM with SAM-to-BAM convert SAM to BAM
- More details. → FAQ: Mismatched Chromosome identifiers and how to avoid them
Info:
The genome choices for rat in CustomProDB.
Where to source the reference genome and annotation for rn6.
How to get the files into Galaxy.
You can use a Custom Genome with most tools! We have an example in a few tutorials, including this one at this step:
- Hands-on: NGS data logistics / NGS data logistics / Introduction to Galaxy Analyses (
#upload-reference-genome)
So that is a lot of information! Hopefully enough to help you with your choice, but please let us know if you have questions. ![]()
