Problem with Output files with pRESTO via Galaxy

silviapc · December 19, 2023, 12:28pm

Hello,

I preprocessed sequences with presto via usegalaxy.org using a workflow that I previously ran through the terminal and I got in the output files much less sequences here than with the presto tool via terminal… and using the same workflow (same parameters and commands).
Why is this happening? I mean, the pRESTO tool that can be used via Galaxy is the same as the Immcantation tool, isn’t it?
Perhaps… does pRESTO via Galaxy have any file size limit to handle or limit of sequences to pre-process?

Thank you in advance!

jennaj · December 22, 2023, 2:24am

Hi @silviapc

Yes, the underlying tool is the same. Scroll down on the tool form to review the conda package version associated with the dependencies … do you notice any differences? Different tool versions usually involve algorithm changes that may explain any differing results.

Do you want to share your job and data for a review? If yes, search this forum with “sharing your history” for the how-to.

silviapc · January 4, 2024, 11:25am

Hi, they are different versions of pRESTO, but I’m surprised that doing my workflow in both ways gives me such different results in terms of the number of sequences in the output files.

pRESTO (Galaxy):

Unique seqs (Collapse_unique file): 1756
Conscount >=2 (atleast_2 file): 99

pRESTO (Immcantation):

Unique seqs (Collapse_unique file): 192299
Conscount >=2 (atleast_2 file): 5721

yes if it’s possible… With which user email can I share my galaxy history for the review?

Thank you in advance!

gbbio · January 5, 2024, 7:38am

presto on galaxy is using version 0.6.2 and the latest version is 0.7.1
You can see some changes here: https://presto.readthedocs.io/en/stable/news.html

The very first step (FilterSeq) will already cause a difference in reads and on top of my head this step is the one that causes the biggest difference but I can be wrong only used presto from the command a few times.

So I would first compare the FilterSeq output between the two versions and after maybe even upload the FilterSeq 0.7.1 output to galaxy and go from there to check if more differences in version changes the output.

silviapc · January 5, 2024, 11:41am

Hi,

Thanks for your advice.
I just compared both FilterSeq output files:

pRESTO (Immcantation):
R1: 276719
R2: 283652
pRESTO (Galaxy):
R1: 299996
R2: 299961

And I’m running my workflow using the FilterSeq output files from pRESTO(Immcantation).

I just compared the output files from the other steps to check where the big differences are between the two versions, and I found that the big drop is in the AssemblePairs sequential step.

pRESTO (Immcantation):
230048
pRESTO (Galaxy):
37639

I think the problem may lie in a parameter I used.

In pRESTO (Immcantation) I followed this workflow: https://presto.readthedocs.io/en/stable/workflows/VanderHeiden2017_Workflow.html because my sequences have a similar structure.

In the AssemblePairs sequential step, I did:
-1 → R2 file (head sequence)
-2 → R1 file (tail sequence)
--rc → tail

For the --rc parameter in Galaxy, the options are: “Neither”, “Read 1 only”; “Read 2 only”; “R1 and R2”. And I used the “Read 1 only” option because I understand it’s the tail sequence… Did I do wrong?

silviapc · January 9, 2024, 10:54am

I have rerun the AssemblePairs sequential step changing the parameters:
-1 with R1
-2 with R2
--rc “Neither”
but after a whole day it is still not done… could someone help me to see what’s wrong with this step please?
Thank you in advance!

jennaj · January 9, 2024, 6:33pm

Do you mean the job is still executing (yellow dataset)? That means it is processing. Try to let jobs finish – if you delete then you won’t get any output or even the full logs from the tool (should there be some data problem the tool detects). This tool suite also has extra logs + custom tools to parse those.

You could double check the strand/orientation of your reads. Or, just run the tool through all the combinations to see which is the best fit for your data.

gbbio · January 22, 2024, 9:49am

Generally presto often takes long to run.
I think your inputs and parameters are not correct.
See the presto manual:

If you want to follow this workflow I think you need to do:
-1 with R2
-2 with R1
--rc “Read 2 Only”

The setting --rc “Read 2 Only” means --rc tail
See:

(https://github.com/galaxyproject/tools-iuc/blob/main/tools/presto/presto_assemblepairs.xml)

silviapc · January 22, 2024, 11:10am

Hi, those were the parameters I used at the beginning, following the workflow. And I was getting errors and/or all the steps didn’t load completely… That’s why I asked this question, because I didn’t know if I was using some parameter wrong, etc.
After trying all the possible combinations for the AssemblePairs step, the combinations with --rc Read 2 only and --rc Neither (with -1 R2 and -2 R1) were the only ones that gave me a similar number of final sequences. So, I will be using -1 R2; -2 R1; --rc Read 2 only.
Most likely, I made a mistake in one of the parameters in the steps before AssemblePairs. That’s why I asked to delete this question to the admins.
Thanks for your help!!

jennaj · January 22, 2024, 9:50pm

Let’s leave this posted as a troubleshooting guide for others please

I’ve marked your reply as the “answer” and closed this out.

Thanks @gbbio for posting your details too

Topic		Replies	Views
RNA-seq analysis (minimap2, featurecounts, deseq2) result differences	8	147	April 11, 2024
Troublshooting Unique.seqs: Status: resolved, please rerun prior failures usegalaxy.org support metagenomics , mothur	6	945	March 8, 2023
error: Job output file grew too large (greater than 200.0 GB) usegalaxy.org support tool-help	3	19	January 10, 2025
the tool fastq to fasta usegalaxyorgau	4	1045	February 19, 2019
Trouble with workflow usegalaxy.org support upload , workflow-extract , troubleshooting	5	26	October 7, 2024

Problem with Output files with pRESTO via Galaxy

Related topics