Workflow Canvas- Why is the output for SamtoFastq "txt" instead of "Fastq"? Confirmed bug: Use "Samtools fastx" instead

Hi all,

I am trying to create a workflow where the result of one program feeds into the next. I have SamtoFastq selected for Step 1, then I was hoping to feed this Fastq file into Trim Galore! But cannot because on the workflow canvas it shows that the SamtoFastq output is .txt instead of Fastq (see attached photo). What did I do wrong? Thanks.

1 Like

Hi @mlim, this seems to be a problem with the SamToFastq galaxy tool. If you go to Configure Outputs you can change the datatype to fastqsanger:

That said I would recommend using Samtools fastx tool, which let’s you choose how to handle read1, read2 and singletons in your alignment file.

1 Like

@mvdbeek: This was an older bug that I thought was fixed about 1 ½ years ago. It did get migrated from tools-devteam to tools-iuc since then. https://github.com/galaxyproject/tools-devteam/issues/414

@mlim: What version of the SamToFastq tool are you using in your workflow?

I’ll also rerun a test with the most current version: SamToFastq extract reads and qualities from SAM/BAM dataset and convert to fastq (Galaxy Version 2.18.2.1). Perhaps there was a regression in functionality. The tests will go back into the original test history: https://usegalaxy.org/u/jen/h/test-datatype-sam-to-fastq. If these fail (can reproduce the txt output), I’ll post back with a new issue ticket. This tool is included in a GTN tutorial so it is important for it to work correctly.

Thanks for reporting the problem!

Update:

The most current versions of the tool are now failing. New ticket for that: https://github.com/galaxyproject/tools-iuc/issues/2747

The output naming was corrected and tested successfully before. However, that tool version and the new versions have some problem. So, keep using the version you are using now and use the post-job action as @mvdbeek suggests, that is the best way to use the tool until the newer versions are fixed (a different problem than the completed txt-not-fastqsanger fix).

Update 2:

This tool is now failing, for all versions, at usegalaxy.org and usegalaxy.eu.

New workaround for End-users

The alternative tool above may be more useful anyway. It has expanded input/output options, including but not limited to: 1) input bam, sam or cram data, coordinate sorted or not, 2) output fasta or fastqsanger (or the appropriate fastq sub-type), 3) output a compressed or uncompressed version of the data, 4) output paired-end reads in different ways (R1 only, R2 only, both R1+R2 in two distinct datasets, or R1+R2 interleaved in a single dataset), and 5) output has the appropriate datatype, assigned directly by the tool. Meaning, there is no need use Configure Output to re-assign the datatype when used in a workflow, and no need to re-assign/re-detect the datatype when used directly in a History.

Allowing Galaxy to assign the datatype will ensure that it is correct and matches the actual content of the user-specified output type. This avoids introducing unintentional mismatched “datatype” metadata problems that can lead to downstream tool errors.

1 Like