Trouble using seqtk_trimfq

My workflow is supposed to quality trim sanger sequences (collection), based on Hands-on: Clean and manage Sanger sequences from raw files to aligned consensus / Clean and manage Sanger sequences from raw files to aligned consensus / Sequence analysis . However, any way I try to convert FASTQ to FASTA from the seqtk_trimfq output it fails. However, if I try to re-run the individual sanger sequences, it succeeds. Therefore I don’t know why this tool is not working in the workflow.

Is there any reason for the failure? The seqtk_trimfq actually works but the workflow fails at the FASTQ to FASTA step. I tried with or without the FASTQ Groomer and both fails, so I think it has something to do with the seqtk_trimfq output.

Welcome, @dede_sa

What is the error message from the tool? Find this on the Job Details view (using the i-con). You can screenshot any messages on the dataset itself and all three logs (these sections expand) would be helpful. Just one is probably enough.

I’m also not sure why the workflow includes the Fastq Groomer step at all. It is not modifying the quality score scaling in a file when the input option is set as the data already being in fastqsanger format (Illumina 1.8+ Sanger Phred). As it is in that workflow, that step is just fully duplicating your data, exactly. There are other uses but the tool is not being used that way in your current workflow.

So, my next question is what does the ab1 to FASTQ converter output look like? I would be interested in the assigned datatype and a peek view of the @/+headers and a few lines of the sequence and quality score lines.

Why it works when run individually versus a collection is odd but maybe with those two data points we’ll get closers to solving that. My first guess would be a tool version difference between the workflow and the tool panel. My second guess would be some problem with the element identifiers on the collection, or the structure of that collection. You could run a tool like Extract element identifiers then post back a few lines, and also explain or screenshot what this collection’s structure looks like.

And finally, you could try disconnecting all noodles between tools in your workflow, then reconnect them in the order of tool execution. Collections have a few structures, and that “metadata” is passed through. Paired collections (lists of lists) are different than simple collections (lists). The wofkflow editor will alert you if the collection structures are a mismatch, then you can add steps to reorganize the structure, then build it up again, all as needed (these do not duplicate data since a collection folder is just a reference to data) with tools in Collection Operations. Hands-on: Using dataset collections / Using dataset collections / Using Galaxy and Managing your Data

Let’s start there! Thanks! :slight_smile:

Thanks for the reply.

As for the error message, it is “AttributeError: ‘NoneType’ object has no attribute ‘serialize’”
The full log is bellow:

Job Execution and Failure Information
Command Line
cat ‘/data/dnb10/galaxy_db/files/f/b/a/dataset_fbafdc58-ecb9-4525-a8dc-a52f3a6e38fd.dat’ | fastq_to_fasta -n -v > ‘/data/jwd05e/main/078/947/78947242/outputs/dataset_91665f17-bf94-428f-8466-4cb166847ab7.dat’
stderr

stdout

Job Information
‘NoneType’ object has no attribute ‘serialize’
Job Traceback
Traceback (most recent call last):
File “/opt/galaxy/server/lib/galaxy/jobs/runners/init.py”, line 304, in prepare_job
job_wrapper.runner_command_line = self.build_command_line(
^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/jobs/runners/init.py”, line 343, in build_command_line
return build_command(
^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/jobs/command_factory.py”, line 167, in build_command
__handle_metadata(commands_builder, job_wrapper, runner, remote_command_params)
File “/opt/galaxy/server/lib/galaxy/jobs/command_factory.py”, line 273, in __handle_metadata
job_wrapper.setup_external_metadata(
File “/opt/galaxy/server/lib/galaxy/jobs/init.py”, line 2345, in setup_external_metadata
command = self.external_output_metadata.setup_external_metadata(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/metadata/init.py”, line 295, in setup_external_metadata
command = super().setup_external_metadata(datasets_dict, out_collections, sa_session, **kwd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/metadata/init.py”, line 214, in setup_external_metadata
with DirectoryModelExportStore(
File “/opt/galaxy/server/lib/galaxy/model/store/init.py”, line 2485, in exit
self._finalize()
File “/opt/galaxy/server/lib/galaxy/model/store/init.py”, line 2368, in _finalize
collections_attrs_out.write(to_json(self.included_collections.values()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/model/store/init.py”, line 2349, in to_json
return json_encoder.encode([a.serialize(self.security, self.serialization_options) for a in attributes])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/model/store/init.py”, line 2349, in
return json_encoder.encode([a.serialize(self.security, self.serialization_options) for a in attributes])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/model/init.py”, line 472, in serialize
return self._serialize(id_encoder, serialization_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/model/init.py”, line 7243, in _serialize
collection=self.collection.serialize(id_encoder, serialization_options),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/model/init.py”, line 472, in serialize
return self._serialize(id_encoder, serialization_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/model/init.py”, line 6952, in _serialize
elements=[e.serialize(id_encoder, serialization_options) for e in self.elements],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/model/init.py”, line 6952, in
elements=[e.serialize(id_encoder, serialization_options) for e in self.elements],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/model/init.py”, line 472, in serialize
return self._serialize(id_encoder, serialization_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/galaxy/server/lib/galaxy/model/init.py”, line 7636, in _serialize
rval[“child_collection”] = element_obj.serialize(id_encoder, serialization_options)
^^^^^^^^^^^^^^^^^^^^^
AttributeError: ‘NoneType’ object has no attribute ‘serialize’

As I tried the workflow without fastq groomer and ended up in the same error, I added it cause I figure it might solve the problem, but it didn’t.

I will paste here the header of one of the sequences output

ab1 to FASTQ converter

@baixa_qualidade
TWKKWTTTTTCTTTTCATGTTTGGGTTTTTGTGCTTTTTTTTATCTTTTAATGCGTTTAGCTTTGGTTGACTTATGCTTTTATATTCTGGATATTATTAATTATGTACTTTACTGCTGTTAATGATTTTTTTTTTGAKGCCTTTWAWAAGGGGGTTACWAAYCTTTTATTCTTTAGTTTCTTTATTGAATGKGTTACCYCGATWAWAATTACTTTTGATAWATTTTTCTTTAAATTCKGTAGYCCCTTTTTTATTCGACTTCTTKGKTGAYTTAAYCYCCTATCTCTACKGYAKGTATTYYACGTTATAWWTTTCTTAWTTGSKGGSTCTCWWWTGCTCATATTWAAKWAYGTTTGATACTTTATTCTATTTCTTTTCAAYCTTATWWAGTGYCAWGCGAKTTTATATAWTCTACKGTARCRCSYAAWWGWATATGACKATTTCACTCTTTTAAATATAGGKASYTATAWYAMTTATTTGTTTTGKCMCKAKT
+baixa_qualidade
&“”“”&&&(&&&(&&(&(&)-961VVOW8+@72[VVVG1+01/+?111,//=4YWGG9>C,1++*2/+2A:9T-+0,011+6+0++(,/(+‘++,0+12.1[510161,6+,0-+26+11R[[<@;300".10:R=":"6–4//..-&"0:"71;1,’(+&)(()(-()(.+&+,1)(“.0++/6”:52)“9"2-1--7/)(“+,-+62:=8…))((+”-)(”,(+/.333/+1).)++/++)(“.”,2"2-)'">"473,3)+,“+”‘’.“”)+.-)((”“>L01…”,(3"“,(”)),-“”“+,1/-“+.”“0”,);0/())&.)),±&(=-?,?+(((,“30))2"”),+")",((”+.-,),“(-02",)”)“(”“+“””),)((2"-07,(,40,/-)(-+)**(&“&”“%$))”“(”$,%(.‘’“*”&“(”$

seqtk_trimfq output

@baixa_qualidade
GGTTTTTGTGCTTTTTTTTATCTTTTAATGCGTTTAGCTT
+
961VVOW8+@72[VVVG1+01/+?111,//=4YWGG9>C

This was a poor quality sequence that’s why it shortened a lot, but other sequences are high quality
The output of seqtk_trimfq is green to the tools seems to be working

Running “extract element identifiers” on the seqtk_trimfq output results in the four sequences I was using to test the workflow

baixa_qualidade.ab1
boa_qualidade.ab1
boa_qualidade2.ab1
media_qualidade.ab1

I am also reving troubles adjusting the workflow. My objective was to filter ab1 sequences, than map/align the reverse primer to it to it is easier to remove primer sequences. I figure that the last workflow didn’t have a reverse complement step so I added it but now I cant concatenate the primer with the sequences in the workflow, it does not allow to connect the steps, but prior it was working fine.

Hi @dede_sa

Let’s go through these in this order if that’s ok. :slight_smile:

First, this output looks good in some ways but not good for others. Your sequence identifier (the @ line) doesn’t have a number.

Please increase your testing data to at least four sequences, then check the @ lines. If all of the sequences have the same identifier, that can definitely lead to problems with the converter tool and any other later downstream tool.

Then for this one,

When you add a new tool to a workflow, you’ll need to disconnect anything down stream of that tool, then reconnect the noodles again. This is what I meant with this comment about the workflow metadata.

Your workflow has that pop-up comment in the editor because of a metadata mismatch. Adjusting the noodles is what fixes that in most cases.

Please give that a try, and if you need more help, please create a new history with just this in it: the inputs and outputs of the workflow, then the workflow itself. Having those paired together will let us debug with more specificity with your exact data.

Thanks! :slight_smile:

I ran the workflow two times yesterday, one using the same 4 sequences as before (Test_same_Seqs on history) and another one using other 4 sequences, which I didn’t change the names of the archives (Test_raw_files). Both times the same error happened. The raw files names had numbers and letters so I don’t know what is causing this failure, it seems something is wrong with FASTQ to FASTA using collections. Tell me if you need any access to see the histories I am talking about.

Hi @dede_sa

I ran a quick test and was able to convert a collection of fastq files using three different tools that perform this manipulation.

Test started at ORG here

Then I imported that over the the EU server and all three also worked

Maybe that helps? You might need to use a different converter or perhaps the example formats show what is expected? If you need more help, you are still welcome to share your history with the small example + the workflow that created it. The links can be posted here in a reply. :slight_smile:

Thanks so much @jennaj now it worked. I changed the FASTQ to FASTA tool, using the “regular one”. The FASTQ to FASTA.FASTX was not working, not sure why.

1 Like

Great you have this working now @dede_sa

The FASTX toolkit version is much older, and originally written for early short fastq reads. It might not work well with all of the newer sequence formats for various reasons. Nice to have this resolved! :slight_smile: