Kept seeing "Illegal stran" when using SAM/BAM to count matrix


I am using Galaxy Australia of SAM/BAM to count matrix to get readout per gene but kept seeing the following issues, which led to no results.

File “/mnt/tools-indices/shed_tools/”, line 82 raise ValueError, “Illegal strand” ^ SyntaxError: invalid synt

I have tried both Yes and No for “Reads are stranded,”

but the same Syntax Error generated.

May I please get your advice on this? Thank you!

1 Like

Hi @Xiaoqiang_Ma,

according to the htseq source code, it seems that there is an error in the strand field of your GTF file. Could you verify it?


1 Like

@gallardoalba Thanks much for your help! I took a look at the gtf file and it seems the strand is showing in both forward and reverse. But the read in the bam file seems always in the forward strand, not sure if this would cause the error.

1 Like

Hi @Xiaoqiang_Ma,
I would try to use the Filter data on any column using simple expressions tool with the following condition in order to remove potential errors from the GTF file: c7==’+’ or c7==’-’


Hi @Xiaoqiang_Ma

As additional advice to that provided by @gallardoalba, it appears that your reference GTF contains header lines. Header lines are out of specification for strict GTF format (even though many data providers include them). When present, those lines can cause odd errors with many tools – and usually not mapping tools (if incorporated) – but instead with downstream tools. It is strongly recommended to remove any GTF headers to avoid technical errors (whether a tool is run in Galaxy, or not).

Please try removing the header lines, rerun, and see if that resolves the error as a first-pass solution. If it doesn’t, then do investigate the GTF content closer. The genome appears to have just one sequence/chromosome with the name “Chromosome”. If that is not a match for the sequence/chromosome label in your BAM datasets, due to the genome mapped against having a different sequence/chromosome label, that can also cause conflicts but can be addressed.

“How to” is covered here:

This is also covered in much prior Q&A:

Many things could be going wrong content-wise, but those are the top issues that tend to produce errors like yours. My guess is that you are hitting the “GTF formatting” issue first, then could possibly hit a “chromosome mismatch problem”. Verify/fix both as needed – full details are in the FAQs above.

Note: The GTF is also a hybrid format (GFF3 transformed into a GTF), but that will probably not be a problem with this particular tool.


1 Like

@gallardoalba I have tried as you suggested but got another error. Thanks!

@jennaj Thanks so much for your information, and I have done removing headlines of the gtf file. But still got the same error info.

Yes, I am using a simple E. coli k12 genome gtf which only has one chromosome. Please take a look the content of the bam file I used, do you think it has some wrongly labeled info that can’t be matched with the content of the gtf file?


1 Like

Ok, thanks for trying that. Your inputs are correct now.

The problem is likely with the tool itself. I didn’t notice this before but it was sourced from the test toolshed which is a sandbox/testing tool repository. And the tool wrapper plus the dependencies it uses haven’t been updated since 2015. My guess is that the tool is still hosted on the AU server due to legacy reasons – and it may work in some special cases/older workflows – but I’m not too surprised it isn’t working now.

Try using Htseq-count or Featurecounts instead. Those tools are current, work, and produce individual count files, which would be required by DESeq2. You could combine those counts into a matrix, or use the individual count files, with EdgeR or Limma. The format of a matrix is on those latter tool’s forms down in the help section. Tools in the group Text Manipulation (example: Multi-Join) could be used to merge the individual count files together into a matrix, but that isn’t required, and if you are not sure how to do that or have problems – skip that and input the individual count files instead.

Examples of RNA-seq DE analysis are covered in the GTN tutorials under the topic “Transcriptomics”. The three tutorials in the group “End-to-End Analysis” are the best place to start for an overview of current methods.

So – the bad news was that this tool won’t work (again, sorry for not noticing where it was sourced originally!) – but the GOOD news is that the changes/checks you made with your inputs would have been needed anyway when using the updated tools/methods.

Please give that a try!

@jennaj Thanks so much for your advice! I took a try by Htseq-count which worked but not for feature count. I will then use Htseq for the data processing. Again, thanks for your kind help! Have a good weekend ahead. :grinning: :+1: