Converting fastq to fasta for analysis with ABRicate or AMRFinderPlus

Hello, I have been using various conversion tools to convert my fastq files to fasta in order to analyze my shotgun metagenomic sequences with AMRFinderPlus or ABRicate. I have tried a few different tools with different settings, and the operation completes. However, when the converted files are used as input for the AMRFinderPlus or ABRicate tools, the operations completes but there are no results in the table output. Is anyone familiar with a workflow or the tools/settings I should use to successfully complete this process? Thanks.

On a recent attempt with ABRicate, this was the following error message “Using nucl database resfinder: 3077 sequences - 2024-Dec-15
Processing: FASTQ_to_FASTA_on_data_6
Warning: [blastn] Query_30861 VH00638:104:A.. : Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options
Warning: [blastn] Query_1046184 VH00638:104.. "

Hi @bgood07

The message from the tool indicates that there is something about the input query that the tool cannot understand or process.

The tool doesn’t explicitly state this in the Help, but the input query should be a nucleotide sequence or record. Protein sequences would likely create an error like yours, but a few other things tool.

What it does

Given a FASTA contig file or a genbank file, ABRicate will perform a mass screening of contigs and identify the presence of antibiotic resistance genes. The user can choose which database to search from a list of available AMR databases.


For this part of your error, it seems you are inputing a fastq sequence record (or records).

Then, since the tool is expecting fasta formatted sequences… (the menu under “accepted formats” is where to check)

there is an implicate conversion process to convert fastq to fasta.

My guess is that something went wrong here.

What happens if you try to do the conversion directly, with a separate tool, as a data preparation step? Even if it fails, you might get more information about what is going wrong! (possibly a log message about a truncated file or other technical issue that needs to be corrected)

This is a good tool choice for most fastq formats

  • FASTQ to FASTA converter (link at UseGalaxy.eu)

  • :warning: Warning! There is another tool from FASTX-toolkit that has a similar tool name! I would suggest avoiding that other version since it won’t work on some of the newer formats of fastq data!

Then, if you get a format error message during the direct conversion, you can explore this more with either of these tools since both report more of the lower level technical details in the logs (whether they “fail” or not). A search in the tool panel with the tool names will find these.

  • FASTQ info validates single or paired fastq files
  • Falco An alternative, more performant implementation of FastQC for high throughput sequence quality control
  • Or, review sequence protocol specific Quality Control resources at the :graduation_cap: GTN → GTN Materials Search (query=quality)

And finally, once you have a valid fastq file in your history, and have converted it to a fasta format, try to run the ABRicate tool again.

For example usage, please scroll down on the tool form into the Help section. You’ll find tutorials that this tool happens to be included in. Not all tools will are included in a tutorial, but this one is!

Then, most tutorials have a workflow! Click through on the tutorial page to find:

Hope this helps! If you get stuck at any step, you are welcome to share your history back here for more specific feedback. See → How to get faster help with your question :slight_smile:

Thank you for the information. I will check it out and try the conversion directly to see what happens. I went back and looked and it doesn’t seem like the fastq quality score characters are removed from the fasta files. So there maybe an issue with the tool parameters.

One more thing, regarding the input, the files are titled fastq_to_fasta but the details described it to be in fasta format. This fille being processed is an output from the fastq_to_fasta tool. Again, it seems like the fastq quality score characters are not being removed. So this could just be a problem in the conversion settings. Also these are nucleotide sequences.

1 Like

Thanks for the feedback @bgood07

The tool wrapper likely has some faulty logic that makes it think it can handle fastq and convert it to fasta, but it seems like it actually can’t, and that lead to your error (quality scores passed as nucleotide reads). I’ll follow up on this tomorrow and open a ticket if I can reproduce. Big picture, there are methods to trigger an automatic “correct” conversion externally to the tool but still at runtime in Galaxy. More later about this, and please don’t wait, since updates to tool wrappers take some time to make then flow down to the servers.

Which then means, yes, converting the input to the expected (and declared) fasta format as a pre-processing data preparation step is likely needed. I’ll be curious about how this works out for you!

Just an update. It looks like the conversion worked this time. The files were formatted correctly. They appeared like:

>1
GGAAACCGAAGTGGA

Also, I ran one through ABRicate and this was the output message:

“Using nucl database resfinder: 3077 sequences - 2024-Dec-15
Processing: FASTQ_to_FASTA_on_data_6__FASTA
Found 0 genes in FASTQ_to_FASTA_on_data_6__FASTA
Tip: have a suggestion for abricate? Tell me at GitHub · Where software is built
Done.”

Thanks for your assistance.

1 Like