Local Installation Performance

Apologies if this is too chatty, not an actual problem, just an anomaly, and from experience anomalies are often signalling a hidden problem.
Basically, my local Galaxy installation (runs within my University firewall) installation processes far faster than when I run the equivalent on a faster stand alone machine (native applications). Worries me as to why? I doubt there will be a specific answer?
So running 20Gb FASTQ files through a workflow that concatenates across lanes then Hisat2>samtools sort/view>stringtie…
On my Galaxy server (ubuntu on an iMac Intel i3 4-core 24GB RAM) takes about 12hrs.
On my much newer iMac (Big Sur) Intel i9 8-core (16 virtual) 48GB RAM this takes days. As far as I can see the command line options are the same except that the sever version ADDs FASTQ groomer and FastQC.
Any thoughts? It seems the Samsort bundled within Galaxy’s Hisat2 >bam is the place where they are so different.
Kindest Regards
Richard

You can see the specific runtime of every tool in the workflow by clicking on tie [i] icon – that is where I’d start to see what the difference is. FASTQ groomer is also a slow tool afaik and ti is unclear why only one invocation would run it but not the other → this indicates the workflows are not the same.

1 Like

Thanks Marten, yes well I have copied the commands and options from Galaxy exactly (except Groomer) to my Mac, —the only thing I can’t see is how many processors it uses because it is "-@{GALAXY_SLOTS:-1} (which means I haven’t yet checked). But all else the same. I am wondering if the Mac binaries for Samtools are just not as efficient as those in the Galaxy linux deployment? That said I have used two different versions; bioconda and homebrew. So if there are no hidden galaxy optimisation steps, and groomer does not make subsequent samsorting run faster it looks like this is an issue for me to resolve with Samsort support?
…ooh did spot another subtle difference will look at this. Galaxy runs samtools sort with default processor and pipes to samsort view with @ pocessors. My Mac was @ for both. Will change this and try again! Thanks

I guess I misunderstood, I thought you are comparing workflow invocation amongst two Galaxy installations.

But in this case it seems we are actually debugging your iMac software configuration that is mimicking the Galaxy workflow’s dependency stack. This is possibly significantly more complex. I’d start with validating the results – are they the same?
If they’re not I’d probably install a Galaxy instance on your iMac and duplicate the workflow invocation properly. This will be rather easy compared to building the same dependency stack because in Galaxy each step is executed in an isolated environment – i.e. you can have different versions of the same dependency invoked during the workflow.
Controlling the number of resources with GALAXY_SLOTS has also possible significant effect on the speed of computation.

2 Likes

Crumbs, hadn’t though of that… I can install Galaxy on the iMac and surely work this out! Thanks (for some reason I had just assumed it was Linux only).
Thanks again
R.