Hello. I’m running FeatureCounts with BAM files I sorted using SamTools Sort (along with creating index .bai for each BAM file). I’m using a uploaded genome.gtf file which matches the genome I used for mapping with Bowtie2. But when I run feature counts, I get the error “an error occurred with this dataset: format tabular” and I need help figuring it out.
I used ChatGPT to try and figure it out and it seems like my BAM files are correct, the gtf file is formatted correctly, and over 98% of the genes were mapped. Otherwise I haven’t been able to troubleshoot it and could use some help. Ideally I would like to not post my history publicly, if it’s possible to do so privately but whatever gets this issue sorted.
Hi @matt_m,
If you created BAM files in Galaxy, sorting is not required: by default, BAM files in Galaxy are coordinate-sorted. Check the command box in Info section: you’ll see smatools sort in the command/pipeline.
As for the error: it is hard to say what is going on. Can you share the history here? If you have many file, just copy one sample into a new history. To share the history, click at History options (three horizontal bars icon) in the top right corner of the history panel > Share or Publish > In the middle window Make this history accessible and paste the URL into reply. Make sure that the annotation file is present in the shared history.
Thank for reaching out with your assitance. That’s good to know that I don’t need the sorting with BAM files, that’ll save me some time in the future. I’m relatively new to this analysis process so any guidance/recommendations is much appreciated!
As for my history, below you’ll find the link. Please feel free to go through it and let me know what you think about the formating or if any other possible issues may have come up.
This part of the message is reporting about the content of the inputs.
ERROR: Paired-end reads were detected in single-end read library
Then scrolling up a bit on that same view to review the original parameters, it looks like you need to toggle a parameter when you rerun. Paired end will better describe the content to be counted up per feature.
Inspecting your sequence identifiers, it looks like your BAM and GTF are from the same genome assembly. The parameters on the Featurecounts form are set up for a GTF file.
The only thing I notice is that you might need to remove the extra # header lines from the GTF. This is what was causing the problem with the gffread run dataset 74. You can run your files through that tool but you might not need it.
I am sorry for the late reply. You submitted featureCounts job in single end mode, while BAM files contain paired-end data. Select “paired end and count them as a single fragment” during the job setup. It is in “Does the input have read pairs” section, close to the end of the job setup page.