- VCF worked fine for a very small file (header + one data line). I didn’t test a larger file. Maybe there is a data size limit and that is what the UCSC user was attempting?
I don’t think that Galaxy uses that. Yes, you can upload small VCF files, but not big ones. We are not Galaxy, we cannot store big files for users for too long. We’ll add something like this, but don’t have the feature yet.
- BAM failed each attempt, even with a small file. It seems the function is looking for a
myfile.bam.bai
index file to upload along with the myfile.bam
. If I added in both URLs into the same data loading submission, that also failed. It seems the that function might be looking in the same place as the bam URL location for the associated index but that is just a guess.
You cannot upload BAM files at all into UCSC. You can only paste a URL to a BAM file into the custom track box, and yes, it expects the .bai next to it, but you can use the setting bigIndexUrl= to point to the bai file, if it’s not stored next to the bam.
A line like this should work:
track=test type=bam bigDataUrl=URL-to-BAM bigIndexUrl=Url-to-BAI
this is documented here, but should probably more widely highlighted, I added notes to various doc pages now. Thanks for bringing this up.
https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#bigDataIndex
Add in a user preference (per account) that the user can set, to control the target UCSC server for both link-outs to UCSC and getting data from the Table Browser. I like this idea the best but it would take some > work to implement.
This seems technically the easiest route to me, as a galaxy-naive programmer. Has the added advantage that local server admins can point to the EU server, e.g. if data protection is an issue. Has the added advantage that user could point to their “own” UCSC mirror on-site when they work with Galaxy.
(It’s a lot easier now than in the past to setup a UCSC mirror site. We have a download+click VM image now and a docker container. You can have your own mirror soon with a single “docker run” command)
- Make a change on the UCSC side to auto-redirect to the preferred UCSC server.
We cannot redirect when the UCSC server is down.
The open in UCSC link-outs involve a handshake between the servers. The links will only show up in Galaxy if the datatype and database/dbkey for that data are appropriate for UCSC. Sharing state is not factor for this type of transfer, instead: is that data in a Galaxy account that is currently logged into and is the target UCSC server available. If yes, the data is sent.
Please note that now we have thousands more genomes available than before, a lot of GCA_ and GCF_ genomes we can handle now. We have many many more genomes than shown on the tree on hgGateway: https://hgdownload.soe.ucsc.edu/hubs/. This should probably a different ticket: support NCBI Assemblies for UCSC linkouts. We have a text file with these accessions (see the URL before) and Galaxy could pull the list once per night. But yes, different ticket.
Transferring data by URL is a bit different. Those URLs are valid for anywhere. If you know the link, the data can be transferred and read at the destination. The reason the URL data transfer might need to have the history sharing state set is because of the administrative “GDPR-mode” some servers apply. Default for GDPR is any data retrieval by URL is restricted/private unless specifically granted by setting the history (or account) permissions.
This makes sense. I didn’t know about GDPR mode. Thanks!
Users can specify the data sharing state (by URL) as an account level user preference – or per history. I wasn’t sure where that user was working so suggested setting the history to shared in case permissions were the problem on the Galaxy side. Sharing state might have been an issue for the user’s VCF (or it was too large?) but it wasn’t for my test file. Their BAM would have failed since what UCSC is trying to read and the single URL for the data don’t quite match up for what UCSC needs. Solving the data/URL send/read for the BAM file type is another thing that could be tuned up, but maybe not as the primary solution since the file sizes will be a limit. Addressing the link-out function seems more important but all this can be discussed
I don’t fully understand, but it sounds as if BAM loading onto UCSC is broken fundamentally, it shouldn’t be. If this is really true (I have trouble believing it, I’m relatively sure that I displayed a BAM file from Galaxy on UCSC years ago…), then this sounds like another ticket, to add the bigDataIndex=xxx to the custom track line.