Problem with Output files with pRESTO via Galaxy

Hello,

I preprocessed sequences with presto via usegalaxy.org using a workflow that I previously ran through the terminal and I got in the output files much less sequences here than with the presto tool via terminal… and using the same workflow (same parameters and commands).
Why is this happening? I mean, the pRESTO tool that can be used via Galaxy is the same as the Immcantation tool, isn’t it?
Perhaps… does pRESTO via Galaxy have any file size limit to handle or limit of sequences to pre-process?

Thank you in advance!

Hi @silviapc

Yes, the underlying tool is the same. Scroll down on the tool form to review the conda package version associated with the dependencies … do you notice any differences? Different tool versions usually involve algorithm changes that may explain any differing results.

Do you want to share your job and data for a review? If yes, search this forum with “sharing your history” for the how-to.

Hi, they are different versions of pRESTO, but I’m surprised that doing my workflow in both ways gives me such different results in terms of the number of sequences in the output files.

pRESTO (Galaxy):

  • Unique seqs (Collapse_unique file): 1756
  • Conscount >=2 (atleast_2 file): 99

pRESTO (Immcantation):

  • Unique seqs (Collapse_unique file): 192299
  • Conscount >=2 (atleast_2 file): 5721

yes if it’s possible… With which user email can I share my galaxy history for the review?

Thank you in advance!

presto on galaxy is using version 0.6.2 and the latest version is 0.7.1
You can see some changes here: https://presto.readthedocs.io/en/stable/news.html

The very first step (FilterSeq) will already cause a difference in reads and on top of my head this step is the one that causes the biggest difference but I can be wrong only used presto from the command a few times.

So I would first compare the FilterSeq output between the two versions and after maybe even upload the FilterSeq 0.7.1 output to galaxy and go from there to check if more differences in version changes the output.

1 Like

Hi,

Thanks for your advice.
I just compared both FilterSeq output files:

  • pRESTO (Immcantation):
    R1: 276719
    R2: 283652

  • pRESTO (Galaxy):
    R1: 299996
    R2: 299961

And I’m running my workflow using the FilterSeq output files from pRESTO(Immcantation).

I just compared the output files from the other steps to check where the big differences are between the two versions, and I found that the big drop is in the AssemblePairs sequential step.

  • pRESTO (Immcantation):
    230048

  • pRESTO (Galaxy):
    37639

I think the problem may lie in a parameter I used.

In pRESTO (Immcantation) I followed this workflow: https://presto.readthedocs.io/en/stable/workflows/VanderHeiden2017_Workflow.html because my sequences have a similar structure.

In the AssemblePairs sequential step, I did:
-1 → R2 file (head sequence)
-2 → R1 file (tail sequence)
--rc → tail

For the --rc parameter in Galaxy, the options are: “Neither”, “Read 1 only”; “Read 2 only”; “R1 and R2”. And I used the “Read 1 only” option because I understand it’s the tail sequence… Did I do wrong?

I have rerun the AssemblePairs sequential step changing the parameters:
-1 with R1
-2 with R2
--rc “Neither”
but after a whole day it is still not done… could someone help me to see what’s wrong with this step please?
Thank you in advance!

Do you mean the job is still executing (yellow dataset)? That means it is processing. Try to let jobs finish – if you delete then you won’t get any output or even the full logs from the tool (should there be some data problem the tool detects). This tool suite also has extra logs + custom tools to parse those.

You could double check the strand/orientation of your reads. Or, just run the tool through all the combinations to see which is the best fit for your data.

1 Like

Generally presto often takes long to run.
I think your inputs and parameters are not correct.
See the presto manual:
image

If you want to follow this workflow I think you need to do:
-1 with R2
-2 with R1
--rc “Read 2 Only”

The setting --rc “Read 2 Only” means --rc tail
See:
image
(https://github.com/galaxyproject/tools-iuc/blob/main/tools/presto/presto_assemblepairs.xml)

2 Likes

Hi, those were the parameters I used at the beginning, following the workflow. And I was getting errors and/or all the steps didn’t load completely… That’s why I asked this question, because I didn’t know if I was using some parameter wrong, etc.
After trying all the possible combinations for the AssemblePairs step, the combinations with --rc Read 2 only and --rc Neither (with -1 R2 and -2 R1) were the only ones that gave me a similar number of final sequences. So, I will be using -1 R2; -2 R1; --rc Read 2 only.
Most likely, I made a mistake in one of the parameters in the steps before AssemblePairs. That’s why I asked to delete this question to the admins.
Thanks for your help!!

Let’s leave this posted as a troubleshooting guide for others please :slight_smile:

I’ve marked your reply as the “answer” and closed this out.

Thanks @gbbio for posting your details too :mechanic:

1 Like