We are trying to upload BAM(hg19) and fastq.gz files with FTP, but the upload stops after about 70G of the files have been uploaded. Our file sizes are about 95-140G in size.
How can we get these files uploaded to useGalaxy?
Tthanks
We are trying to upload BAM(hg19) and fastq.gz files with FTP, but the upload stops after about 70G of the files have been uploaded. Our file sizes are about 95-140G in size.
How can we get these files uploaded to useGalaxy?
Tthanks
Hello – The maximum dataset size for Upload is usually 50 GB but needs to be a bit smaller for BAM datasets (around 25-35 GB). Datasets produced by tools can be larger.
This is true when working at the public Galaxy servers like https://usegalaxy.org. You would run into quota problems (max 250 GB total per account) and likely tool problems (exceed resources) working with such large data.
If you are running your own Galaxy, larger data can be loaded by administrators into data libraries using other methods instead of FTP, please see: https://docs.galaxyproject.org/en/master/admin/useful_scripts.html?highlight=data%20libraries
Galaxy choices:
Thanks Jennifer,
That answers our question.
Our problem was that we had not experience this limitation until after the first of this year. We were aware of the 250G limit, but previously we had been able to upload files larger than 50G without problem. For example, our last uploaded file in December was about 105.7G in size and is still in our history (see attachment).
We think we could do our work within the 250G limit, but cannot do so with the current upload file size limit.
Cheers,
Dave Cissell
Ok, that larger BAM load is certainly interesting! Especially if it is intact and useable. But that is not the norm.
Few tools would be expected to work with data this large at public Galaxy servers. And uploading such large datasets, successfully at Galaxy Main https://usegalaxy.org, shouldn’t be expected going forward. It would be better to move to your own Galaxy and allocate the resources needed.
ps: Next time you can skip loading the .bam.bai
index. Galaxy recreates these upon loading. To access/download a .bam.bai
index that is already in Galaxy, expand the dataset and click on the disc icon.