I preprocessed sequences with presto via usegalaxy.org using a workflow that I previously ran through the terminal and I got in the output files much less sequences here than with the presto tool via terminal… and using the same workflow (same parameters and commands).
Why is this happening? I mean, the pRESTO tool that can be used via Galaxy is the same as the Immcantation tool, isn’t it?
Perhaps… does pRESTO via Galaxy have any file size limit to handle or limit of sequences to pre-process?
Yes, the underlying tool is the same. Scroll down on the tool form to review the conda package version associated with the dependencies … do you notice any differences? Different tool versions usually involve algorithm changes that may explain any differing results.
Do you want to share your job and data for a review? If yes, search this forum with “sharing your history” for the how-to.
Hi, they are different versions of pRESTO, but I’m surprised that doing my workflow in both ways gives me such different results in terms of the number of sequences in the output files.
pRESTO (Galaxy):
Unique seqs (Collapse_unique file): 1756
Conscount >=2 (atleast_2 file): 99
pRESTO (Immcantation):
Unique seqs (Collapse_unique file): 192299
Conscount >=2 (atleast_2 file): 5721
yes if it’s possible… With which user email can I share my galaxy history for the review?
The very first step (FilterSeq) will already cause a difference in reads and on top of my head this step is the one that causes the biggest difference but I can be wrong only used presto from the command a few times.
So I would first compare the FilterSeq output between the two versions and after maybe even upload the FilterSeq 0.7.1 output to galaxy and go from there to check if more differences in version changes the output.
Thanks for your advice.
I just compared both FilterSeq output files:
pRESTO (Immcantation):
R1: 276719
R2: 283652
pRESTO (Galaxy):
R1: 299996
R2: 299961
And I’m running my workflow using the FilterSeq output files from pRESTO(Immcantation).
I just compared the output files from the other steps to check where the big differences are between the two versions, and I found that the big drop is in the AssemblePairs sequential step.
pRESTO (Immcantation):
230048
pRESTO (Galaxy):
37639
I think the problem may lie in a parameter I used.
In the AssemblePairs sequential step, I did: -1 → R2 file (head sequence) -2 → R1 file (tail sequence) --rc → tail
For the --rc parameter in Galaxy, the options are: “Neither”, “Read 1 only”; “Read 2 only”; “R1 and R2”. And I used the “Read 1 only” option because I understand it’s the tail sequence… Did I do wrong?
I have rerun the AssemblePairs sequential step changing the parameters: -1 with R1 -2 with R2 --rc “Neither”
but after a whole day it is still not done… could someone help me to see what’s wrong with this step please?
Thank you in advance!
Do you mean the job is still executing (yellow dataset)? That means it is processing. Try to let jobs finish – if you delete then you won’t get any output or even the full logs from the tool (should there be some data problem the tool detects). This tool suite also has extra logs + custom tools to parse those.
You could double check the strand/orientation of your reads. Or, just run the tool through all the combinations to see which is the best fit for your data.
Hi, those were the parameters I used at the beginning, following the workflow. And I was getting errors and/or all the steps didn’t load completely… That’s why I asked this question, because I didn’t know if I was using some parameter wrong, etc.
After trying all the possible combinations for the AssemblePairs step, the combinations with --rc Read 2 only and --rc Neither (with -1 R2 and -2 R1) were the only ones that gave me a similar number of final sequences. So, I will be using -1 R2; -2 R1; --rc Read 2 only.
Most likely, I made a mistake in one of the parameters in the steps before AssemblePairs. That’s why I asked to delete this question to the admins.
Thanks for your help!!