@gallardoalba Thanks much for your help! I took a look at the gtf file and it seems the strand is showing in both forward and reverse. But the read in the bam file seems always in the forward strand, not sure if this would cause the error.
Hi @Xiaoqiang_Ma,
I would try to use the Filter data on any column using simple expressions tool with the following condition in order to remove potential errors from the GTF file: c7==’+’ or c7==’-’
As additional advice to that provided by @gallardoalba, it appears that your reference GTF contains header lines. Header lines are out of specification for strict GTF format (even though many data providers include them). When present, those lines can cause odd errors with many tools – and usually not mapping tools (if incorporated) – but instead with downstream tools. It is strongly recommended to remove any GTF headers to avoid technical errors (whether a tool is run in Galaxy, or not).
Please try removing the header lines, rerun, and see if that resolves the error as a first-pass solution. If it doesn’t, then do investigate the GTF content closer. The genome appears to have just one sequence/chromosome with the name “Chromosome”. If that is not a match for the sequence/chromosome label in your BAM datasets, due to the genome mapped against having a different sequence/chromosome label, that can also cause conflicts but can be addressed.
The summary in this post is concise. Don’t worry about the specific tool/context of the original post – the format help for GTF data applies across analysis methods/tools:
Many things could be going wrong content-wise, but those are the top issues that tend to produce errors like yours. My guess is that you are hitting the “GTF formatting” issue first, then could possibly hit a “chromosome mismatch problem”. Verify/fix both as needed – full details are in the FAQs above.
Note: The GTF is also a hybrid format (GFF3 transformed into a GTF), but that will probably not be a problem with this particular tool.
Yes, I am using a simple E. coli k12 genome gtf which only has one chromosome. Please take a look the content of the bam file I used, do you think it has some wrongly labeled info that can’t be matched with the content of the gtf file?
Ok, thanks for trying that. Your inputs are correct now.
The problem is likely with the tool itself. I didn’t notice this before but it was sourced from the test toolshed which is a sandbox/testing tool repository. And the tool wrapper plus the dependencies it uses haven’t been updated since 2015. My guess is that the tool is still hosted on the AU server due to legacy reasons – and it may work in some special cases/older workflows – but I’m not too surprised it isn’t working now.
Try using Htseq-count or Featurecounts instead. Those tools are current, work, and produce individual count files, which would be required by DESeq2. You could combine those counts into a matrix, or use the individual count files, with EdgeR or Limma. The format of a matrix is on those latter tool’s forms down in the help section. Tools in the group Text Manipulation (example: Multi-Join) could be used to merge the individual count files together into a matrix, but that isn’t required, and if you are not sure how to do that or have problems – skip that and input the individual count files instead.
Examples of RNA-seq DE analysis are covered in the GTN tutorials under the topic “Transcriptomics”. The three tutorials in the group “End-to-End Analysis” are the best place to start for an overview of current methods.
So – the bad news was that this tool won’t work (again, sorry for not noticing where it was sourced originally!) – but the GOOD news is that the changes/checks you made with your inputs would have been needed anyway when using the updated tools/methods.
@jennaj Thanks so much for your advice! I took a try by Htseq-count which worked but not for feature count. I will then use Htseq for the data processing. Again, thanks for your kind help! Have a good weekend ahead.