Is there a way to edit all headers in a BAM file?

elpren · February 12, 2025, 7:14pm

Hi,

I have three sets of RNA-Seq data that I have treated exactly the same, I have aligned to my genome (HISAT2), merged (Merge BAM files) and used to annotated my genome (BRAKER3). These gave good mapping rates (all >90%).

I am having trouble using featureCounts on my BAM files. I noticed that one of the BAM files worked perfectly with featureCounts, while the other two gave 0 assigned reads. I then noticed that the sequence name in the header of the other two BAM files did not match the sequence name in the GTF annotation file. Assuming this is the source of my problem, is there a way to fix this? There are many sequence names in each BAM file, e.g.:
ABCD01000000001.1
ABCD01000000002.1
ABCD01000000003.1
etc.

The only part of the sequence name that differs are the letters, i.e. the mismatched headers look as follows:
ABCE01000000001.1
ABCE01000000002.1
ABCE01000000003.1
etc.

Thank you in advance!

jennaj · February 13, 2025, 2:12am

Welcome, @elpren

While yes, you could edit a BAM file directly with text manipulation tools if you convert it to plain text first (SAM format), that is usually a really bad idea!

Those lines in your BAM file are the sequence identifiers from the reference genome you mapped against. Those map to the fasta sequences in your reference genome – and those labels for specific nucleotide strings matter!

Then the GTF reference annotation is where other features have been placed along those same nucleotide strings: genes, transcripts, other features.

Featurecounts is counting up hits against coordinates on those sequences where the features in the annotation are. Mixing this up means that all your counts will be wrong (if they even process, this tends to fail!).

In short, you have a Reference data mismatch problem. You may have mapped against the wrong genome for a few runs. You can double check what you used for parameters per job on the Job Details view (using the i-icon) and make changes when you run the job again (using the rerun-icon).

This FAQ has more details but I’m guessing a rerun with a different target database choice for your case is enough. Use the genome that is a match with the GTF for all steps.

FAQ: Mismatched Chromosome identifiers and how to avoid them

If you need more help with confirming the problem, you are welcome to share back your history for more exact feedback. If you are working through a tutorial, please link that back too for context. See → How to get faster help with your question

Let’s start there, and let us know if you solve this!

elpren · February 13, 2025, 9:31am

Hi @jennaj,

Thanks so much for your help! I checked the input genome for the two mismatched RNA-Seq, and it is indeed a slightly different input to the correctly header-ed RNA-Seq (unfortunately, it’s one I’ve since deleted and purged so I can’t check exactly where I went wrong)! I’ve rerun the alignments and the sequence names now match.

Just wondering, do I now have to rerun BRAKER3? I assume some of the names in the annotation will be incorrect?

Thanks again!

jennaj · February 14, 2025, 12:21am

Hi @elpren

If you were using those mapping results with any tool, yes, you should rerun using the corrected data. The chromosome names and the sequences and the mapping coordinates will all be different with a different mapping target genome.

Very glad you have this working now!

Topic		Replies	Views
hisat2 and featurecounts usegalaxy.org support gtn-tutorial , workflow , galaxy-local , mapping , transcriptomics , featurecounts	23	2089	October 28, 2024
Segmentation fault in featurecounts troubleshooting , featurecounts , bowtie2	1	35	January 17, 2025
htseq count issue using UCSC-sourced indexed bacterial genomes and NCBI reference annotation (mismatched chrom identifiers) custom-genome	3	821	May 17, 2019
Unable to use featurecounts tool mapping	2	1223	January 9, 2021
Error with FeatureCounts usegalaxy.org support transcriptomics	5	43	April 14, 2025

Is there a way to edit all headers in a BAM file?

Related topics