Error with FeatureCounts

matt_m · March 25, 2025, 8:10pm

Hello. I’m running FeatureCounts with BAM files I sorted using SamTools Sort (along with creating index .bai for each BAM file). I’m using a uploaded genome.gtf file which matches the genome I used for mapping with Bowtie2. But when I run feature counts, I get the error “an error occurred with this dataset: format tabular” and I need help figuring it out.

I used ChatGPT to try and figure it out and it seems like my BAM files are correct, the gtf file is formatted correctly, and over 98% of the genes were mapped. Otherwise I haven’t been able to troubleshoot it and could use some help. Ideally I would like to not post my history publicly, if it’s possible to do so privately but whatever gets this issue sorted.

Thanks,
Matt

igor · March 28, 2025, 4:28am

Hi @matt_m,
If you created BAM files in Galaxy, sorting is not required: by default, BAM files in Galaxy are coordinate-sorted. Check the command box in Info section: you’ll see smatools sort in the command/pipeline.

As for the error: it is hard to say what is going on. Can you share the history here? If you have many file, just copy one sample into a new history. To share the history, click at History options (three horizontal bars icon) in the top right corner of the history panel > Share or Publish > In the middle window Make this history accessible and paste the URL into reply. Make sure that the annotation file is present in the shared history.

It sounds like a data format issue.

Kind regards,
Igor

matt_m · April 8, 2025, 2:11pm

Hi Igor,

Thank for reaching out with your assitance. That’s good to know that I don’t need the sorting with BAM files, that’ll save me some time in the future. I’m relatively new to this analysis process so any guidance/recommendations is much appreciated!

As for my history, below you’ll find the link. Please feel free to go through it and let me know what you think about the formating or if any other possible issues may have come up.

Thanks,
Matt Mayor

jennaj · April 8, 2025, 5:40pm

Hi @matt_m

Thanks for sharing your history!

How to review work is explained in a few topics, including this one.

For your shared history, let’s look at featurecounts dataset 72.

Clicking into the job details pages (i-info icon) to review the error message shows this content.

This part of the message is reporting about the content of the inputs.

ERROR: Paired-end reads were detected in single-end read library

Then scrolling up a bit on that same view to review the original parameters, it looks like you need to toggle a parameter when you rerun. Paired end will better describe the content to be counted up per feature.

Then, to do things like confirming the reference data, this is a good guide to follow.

FAQ: Extended Help for Differential Expression Analysis Tools

Inspecting your sequence identifiers, it looks like your BAM and GTF are from the same genome assembly. The parameters on the Featurecounts form are set up for a GTF file.

The only thing I notice is that you might need to remove the extra # header lines from the GTF. This is what was causing the problem with the gffread run dataset 74. You can run your files through that tool but you might not need it.

Tutorial examples can be found here.

Transcriptomics / Tutorial List

More about preparing reference data can be found under these tags.

custom-genome reference-genome reference-annotation reference-transcriptome

Hope this helps!

igor · April 13, 2025, 8:16am

Hi @matt_m,

I am sorry for the late reply. You submitted featureCounts job in single end mode, while BAM files contain paired-end data. Select “paired end and count them as a single fragment” during the job setup. It is in “Does the input have read pairs” section, close to the end of the job setup page.

You can unshare the history.

Kind regards,
Igor

matt_m · April 14, 2025, 2:06pm

Hi Igor,

Oh goodness, what a silly mistake haha. Much appreciated for your help, I will make sure to try that.

Thanks,
Matt Mayor