Hi, I’m trying to use CustomProDB to create a protein fast file out of my transcriptomic variant data from rat (Rn6).
The problem is that my bam file use chromosome identifiers like 1,3,2 …MT instead of chr1, chr2…chrM
So I think that is the problem.
The questions is>
- Should I change the identifiers in my bam file manually?
Or there are convenient way to fix it?
Much appreciated for your help!
Best
Xin
Welcome @XinZhouUmea
Adjusting the identifiers inside of a BAM is tricky and there isn’t an automated way to do this that I know if. The risk of having slightly mixed up genome assemblies is also a consideration.
As you know, right now, your data has Ensembl identifiers for a rat assembly, and CustomProDB is expecting a UCSC/NCBI identifier format. We don’t know if the assembly versions are exactly the same (in particular, the haplotype/alt contigs can differ across base assemblies!). If you don’t want to create your own rdata files for the genome you are currently using, then remapping the data against one of the genomes natively indexed is probably needed.
We have a guide here that uses human as an example but the same can apply to any genome. In short, there can be multiple assemblies and using the same one throughout all steps is really important for coordinate based data. → Reference genomes at public Galaxy servers: GRCh38/hg38 example
All that said, if you can confirm that the two assemblies are identical except for the identifiers, and want to try to modify the data anyway (not recommended!), the process would go something like this:
- Replace column by values which are defined in a convert file – see the bottom of the tool form for a link to a public repository where IDs are mapped across assemblies.
- Run BAM-to-SAM convert BAM to SAM to isolate the data hit lines from the header.
- Run Replace column to modify the data hit lines (in tabular format) to the target assembly.
- Run a small mapping against the target assembly you are converting to (to generate a header).
- Put it all together with a tool like Samtools reheader copy SAM/BAM header between datasets
- Convert back to BAM with SAM-to-BAM convert SAM to BAM
- More details. → FAQ: Mismatched Chromosome identifiers and how to avoid them
Info:
The genome choices for rat in CustomProDB.
Where to source the reference genome and annotation for rn6.
How to get the files into Galaxy.
You can use a Custom Genome with most tools! We have an example in a few tutorials, including this one at this step:
So that is a lot of information! Hopefully enough to help you with your choice, but please let us know if you have questions. 
Hi Jennaj,
Thank you for your comprehensive information! It was really informative.
So, I decided to generate a customized RData file for my analysis using the rat RN6 genome. It passed the first round! However, the analysis eventually got stuck with the following error message:
Error in exon$tx_name: $ operator is invalid for atomic vectors
You can check the job here: 4838ba20a6d867656592932aa8015b8e
I finally solved it! Have to use R script and cutomProDB library to creat the custom anotations. Thanks again for your help!
Great! Glad this is working for you now! 