Assistance with Filtlong Tool for Quality Filtering of FastQ Data

Hello forum members,

I’m encountering an issue while attempting to filter my FastQ data based on a mean quality score greater than or equal to 20 using the Filtlong tool. Despite applying the tool, the FastQC report shows no apparent changes, and upon closer inspection, it seems that no reads were filtered at all.

To provide a clearer picture, I’ve attached two screenshots—one with the original data and another after applying the Filtlong tool. I’ve chosen Filtlong for filtering as my output data files are in .fastq or .fastq.gz format, and unfortunately, they are not in fastqsanger or similar formats compatible with other filtering tools.

Could someone please help me understand what might be going wrong or if there’s a step I’m overlooking in the process? Any insights or guidance on using Filtlong effectively in this context would be greatly appreciated.

Thank you in advance for your assistance!


Hi @ge96dah

What format are your data in? What does FastQC report a the top of the report for the quality score scaling?

You could also try using the auto-detect datatype function Detecting the datatype (file format). Maybe the quality score scaling is already Fastq Sanger. The Upload tool would also detect the datatype (try using all defaults during Upload, especially for read data). If Galaxy guesses wrong, that is usually an important clue about some format/content problem.

I’m also wondering how the data was input to the tool. “Dragging and Dropping” is never recommended. The format requirements are expecting .fastqsanger.

Most fastq data uses that same quality scale scoring now, even long reads.

Related Q&A fastq unavailable -- Tool does not recognize inputs? How to check why - #2 by jennaj

Hello @jennaj,

The input is Nanopore sequencing data. This is usually a .fastq.gz file but I am able to unzip it with Python before uploading the data to Galaxy. The FastQC report says that it has Sanger / Illumina 1.9 scaling.

Update:
I tried to auto-detect again and Galaxy still detects the data type as fastq.gz file. Should I assign a new data type?

Hi @ge96dah

That is fastq sanger. The redetect for datatype should work fine.

Am I missing something? Did you try that already?

Update:

Maybe the tutorial here will help? It covers nanopore reads. If nothing else, you could try comparing the methods now that the datatype issue seems to be resolved. Quality Control