Installed local Galaxy 24.0 on GCP Ubuntu 24.04. Galaxy submits jobs through pulsar to separate GCP instance[s] that run command-line tools. NCBI makeblastdb successfully generated nucleotide blastdb in Galaxy History – I see all 12 blastdb.n?? files, but when I run NCBI BLAST+ blastn tool against this blastdb in Galaxy history – it fails to stage / copy 12 blastdb.n??? generated by makeblastdb tool. It shows a single file – here is the command Galaxy ran:
gspier96@worker-6npw:/opt/pulsar/files/staging/278$ cat tool_script.sh
blastn -query ‘/opt/pulsar/files/staging/278/inputs/dataset_38b958dc-fefe-406f-ab17-7999e653898a.dat’ -db ‘/opt/pulsar/files/staging/278/inputs/dataset_6bc3f0f1-2c89-49f5-be1f-84de99a03580_files/blastdb’ …
gspier96@worker-6npw:/opt/pulsar/files/staging/278$ ls -l /opt/pulsar/files/staging/278/inputs/dataset_6bc3f0f1-2c89-49f5-be1f-84de99a03580_files/blastdb.*
-rw-rw-r–+ 1 pulsar pulsar 162 May 23 01:18 /opt/pulsar/files/staging/278/inputs/dataset_6bc3f0f1-2c89-49f5-be1f-84de99a03580_files/blastdb.ndb <<<< Only 1 / 12 blastdb.n?? files and it is gibberish:
What is the way[s] to configure pulsar to stage multiple files OR is there an alternative way to resolve this issue? NCBI_blastn vs. shared blastdb (not in Galaxy History) works fine.
Thanks you! - Gene Spier gspier96@gmail.com
After that, I am wondering if you ran this wrapper to generate the index → toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_makeblastdb/2.16.0+galaxy0
Are all the files nested into a single dataset in the history with the blastdbn datatype? (this is defined to be a multi-file format). Like this one?
Or do you have several datasets created by a custom wrapper instead? This is the part that confused me and makes me think you are running something else. You could define a new multi-file datatype if the tool is custom.
I have an older example here if you want to get same test data to see how it works with the command lines we are using at this server. I ran all of the history based custom index wrappers for BLAST+ tools in here. My files were tiny but maybe helps anyway.
To follow up more, would you like to clarify how this is being run? You could also include job configuration or pulsar configuration snippets (redact anything private).
Indeed I generated blastdb in Galaxy history using the NCBI_makeblastdb_wrapper from NCBI blast v2.16.
NCBI blastn Galaxy tool works fine on my testing instance I installed on my local Ubuntu (no pulsar) that I use to test XMLs.
In both Galaxy-pulsar (it runs on google GCP) and local Galaxy – I see 12 blastdb.n??? files generated by makeblastdb wrapper, but – because on GCP Galaxy server & “workers” that run tools do not mount the same disk – pulsar – as I understand – was supposed to copy (rcp or remote transfer – I tried both) to “stage” these 12 blastdb.n?? files, but it did not. Other input files, e.g., fasta input for blastn – are fine – they are indeed copied for “staging”.
Galaxy blastn works fine on both GCP & local vs. shared blastdb in $BLASTDB; it fails only on “blastdb in Galaxy history” and only on GCP.
Maybe… I need to set a shared disk drive for Galaxy & worker servers – no file copying.
To double check, I went back through my notes for the complex artifacts in the history (more than just a single dataset or dataset/index pair, like a file.bam and the file.bam.bai pair). Now, these used to require a shared object library to get the data from. A copy is was not enough since the multiple files were ignored. That used to be for blastdb or any of the sqlite types of data (example: snpeffdb).
However, I’m not sure if this is still required or not or what the other options are. Let’s ask the admin group for some clarity! You're invited to talk on Matrix