RNA/DNA converter on fasta.gz format

Hello,
I have one very large fasta.gz file uploaded from FTP link which is in RNA format. I would like to convert it to DNA, which is naturally using RNA/DNA converter tool.

However that tool returns a 0 byte result, which I suspect is an input format issue.
I have uncompressed the fasta.gz to fasta, but RNA/DNA converter returns the following error: […]tool_script.sh: line 25: -d: command not found

Please, does anyone know a workaround for this situation?

1 Like

Hello @eduardofox2

There was a problem with all tools from the Fastx-toolkit. Those issues were just now corrected and testing is in progress … but you could also just try a rerun for quicker results :slight_smile:

If there are any lingering issues uncovered on our side with those tools, we’ll post an update back here.

These tools should convert compressed fasta to uncompressed at runtime, but that will be also something that I’ll double check. Since you already have the data uncompressed, rerun using those inputs to avoid more delays.

Thanks for reporting the problem!

Hello jennaj, thanks for the quick reply !

However it still does not work as expected, it seems. Running RNA/DNA converter on the compressed (original) fasta will only covert the first sequence. And running on the uncompressed fasta will return an error – first in that it requires single-line formatting which I eventually corrected with “fasta width formatter”, and then finally it complains of some invalid sequence within the corrected fasta file(s).

Eventually I have used sed manipulation substitution which is not ideal but seems to have done the trick.

Please, double check with RNA/DNA converter is working fine on large compressed datasets.

Thanks!

1 Like

Hi @eduardofox2

Doubled checked and you are correct, FASTX-toolkit tools will not work with compressed data. Input uncompressed and use unwrapped formatting. These are older tools wrappers – originally designed for an older short read input type. But they can be still be used with updated formats, it will just take some adjustments on your end.

The invalid sequences probably contain IUPAC characters – not just AUCGN/ATCGN. Change IUPAC bases to N as needed.

Another alternative tool is Manipulate FASTQ reads on various attributes (Galaxy Version 1.1.5).

Thanks!

1 Like