The most strange thing happened. I aligned a sample of raw RNA-seq data with HISAT2 (default settings). Then I used the function [Filter SAM or BAM, output SAM or BAM files on FLAG MAPQ RG LN or by region] to cut a 20 kb region that I am interested in. I don’t have a lot of storage space. So far so good. But when I opened it in IGV, there were more than 20 kb, way more. I could see the entire chromosome. I checked two more chromosomes and they were empty.
This happened with only one sample but I have 9 from the same batch that are processing right now.
I have followed this protocol before and nothing like this happened. The only time when I could see more outside my region, it was solely because the ends of the exon were exceeding it. Which made sense.
Do you have any opinions about it?
BTW I checked and the region that I selected was the right one.
Isn’t this tool producing a second output with your query turned into json format? This looks like a great opportunity to use this dataset and share its contents here.
When should be written as a contiguous term, with no space after the :
chrX:1-1,000
or this (the inclusion of commas , is optional):
chrX:1-1000
With the extra space included, the filter is malformed, and effectively ending up with an applied filter based on only the chromosome name (the first “word” in the term).
I think we could (maybe) make the usage better. Not just for this tool, but any filter tool that has end-user free text (or a text file “list”) for region filters. If interested, I made a catch-all ticket here https://github.com/galaxyproject/usegalaxy-playbook/issues/227. The change could touch many tools, so please consider it a nice-to-have-wish-list-item for now.
Upvote/comment on the ticket if you agree this would help. Same for others reading! We don’t know definitively how often this is a usage issue end-users encounter, with filter tools in general or this tool specifically. Feedback from Galaxy users helps to inform the choices we make about changes/upgrades.