sixpack translation problem

why did sixpack translate only a small portion of the >5000 sequences I uploaded?

1 Like

Welcome, @BenMulder

It is difficult to know for certain but it seems possible that the server where you are working couldn’t handle a larger processing job if you are inputting over 5k nucleotide sequences for a six-frame translation job!

As a potential work-around, you could split the larger file up into smaller files, process each of those, then merge the results back together. Whether this is appropriate depends on the tool but the one you are using is probably Ok for this.

The process usually goes something like this. It runs in a batch, and if you put all of the tools into a simple workflow, it would run almost like a single (custom) tool.

If this helps, please let us know. If you still need more help, please share more details. For what you explain, the server where you are working and the exact tool name/version are likely the most important details.

Let’s start there, thanks! :slight_smile:

Hi Jennaj,
the split-run-concatenate approach you suggested worked to a certain extent as Sixpack only translated the first sequence of each dataset.
I did contemplate split my dataset in 5000 single sequence ones but that is a lot.
Is there anything I can do?
Any help is appreciated.

1 Like

Hi @BenMulder

You are doing the split in a batch, then running the collection through in a batch, then merging in a batch, yes? Then 5000k doesn’t matter.

That is just three clicks – one per step above – no matter how much you are splitting up. Also, this distributes the work across cluster nodes, and will process just as fast if not faster than the merged file since some can run in parallel.

I’m not sure if the EMBOSS original command line tool can process more at a time or not … but reading the tool form as a reminder to myself, I think this tool really does just process one sequence at a time. So, what you are explaining and doing now is the best way to get your data through.

Maybe I am misunderstanding? So far I think that splitting up the query is the way to get this to processed. Let us know how that works!

Later on, if any of the sub-jobs happens to fail, you can rerun just those (after maybe reviewing the input sequence to make sure that wasn’t the issue). When you rerun, there will be an extra box on the form to replace the new result back into the original collection. There might be a few of these in 5k jobs since some can fail by chance.

Jennaj,
I was able to obtain the 3-frame translation I was looking for by:
Uploading the FASTA file to Galaxy.eu
Used SPLITFASTA to split the FASTA file in a collection of 4530 single FASTA files.
Used SIXPACK to translate the collection of 4530 single FASTA files.
Tried to use Concatenate datasets tail to head (did not work)
Concatenated (locally) in terminal
Thank you very much for your help.

Thanks for letting us know, @BenMulder

An alternative tool is Collapse Collection (also in the tool panel).

But super glad this worked out!