Uploading large file problem

Hi i am trying to upload a single large file of 38.4Gb, but it is very slow and after a time it drops out. Please could you let me know how to achieve this upload. I tried to look for Galaxy’s FTP server but couldn’t find it in user preferences.

Hi i am trying to upload a single large file of 38.4Gb, but it is very slow (10% in 3 hours). Please could you let me know how to achieve this upload. I tried to look for Galaxy’s FTP server but couldn’t find it in user preferences. Btw I decided to upload one of two 38.4Gb files (paired end reads of WGS) just to check that it would work. I would be grateful for some advice. Thank you

Hi @richard64

General instructions for Upload are in this guide → Getting Data into Galaxy

FTP is not available at all public Galaxy servers, since the Upload tool itself supports a resume function. The data loading is processing through the API, and works about the same way as FTP.

For batch loading of data, you can access the API directly on the command line. See the guide above for the details. The speed will be the same but this is convenient when there are many files, or when data loading, workflow execution, and result downloads are all run in batches.

What you can try:

  1. Move to a faster internet connection. The rate of data loading is based on your connection bandwidth and the server bandwidth. The large public servers have a lot of capacity, so the limit is usually with the “from” connection.

  2. Load the data to your private cloud data storage location first, then transfer the data to Galaxy.

    This means getting the data up to servers just once, and you might want to export your results that way too. Server-to-server transfers are much faster since they don’t involve home internet connections that are optimized for “download” speeds, not “upload speeds”.

    The latest release supports even more cloud providers, so look under User → Preferences for the options available at the server you are working at. Please ask if you need help with this.

  3. Start the transfer in the background, and let it process.

    You’ll need to keep your computer on throughout. You will be able to resume a broken connection for a short window, but will need to be responsive.


For your files, yes these might take some time. You could investigate your connection speed with a speedtest tool to see what it is at, then consider moving to faster connection. Not running other data transfers at the same time might help too (avoiding streaming, etc).

And, if this is something you plan to repeat, putting the data in Dropbox or one of the other options available to you is another way around this. Any public URL will work, so a local server that can host the data can work too. Working with large data from a personal computer over a home connection will always be the slowest option for getting data up to any server then back down again – doing this just once has strong advantages, even if less convenient.

Finally, if the data is from some public data source, transferring data directly from that source into Galaxy is the “best” way. Please ask if you need help with how.

If you want to share the URL of the server you are using, I can try a test, too, to double check that more is not going on with an independent test. Also please confirm, these are fastq.gz files, yes?

Let’s start there! :slight_smile:

Hi Jennifer

Thank you for your response.

First, yes, I can confirm that the Illumina data sets are fastq.gz files, I have them on an external drive, not in my laptop.

I have three whole genome sequence samples. Each sample is 2 x 38Gb in size (they are paired end reads).

My plan was to upload each 2 x 38GB together, but first tried 1 x 38Gb when I found the speed problem.

Also, as you suggested, I have run a speed test and it says: Download 35.6 Mbps and Upload 8.83 Mbps. As you mentioned uploading is much slower.

Re: a faster internet connection. I am not sure how I would achieve that, but I am curious to know what upload speed would carry out this task in a reasonably short time?

Re: Uploading to the cloud. Would this be secure, couldn’t someone steal the data? Also if I did that could you let me know how to transfer from the cloud to the Galaxy platform. Also do you happen to know which cloud based data storage would be easiest and best to use?

BTW: I do have a lead that I can connect to my EE modem straight to my laptop. Do you think that would help?

My regards

Richard Melzack

Hi @richard64

I want to mention this first – we are having sporadic Upload issues at UseGalaxy.org, so if that is where you are working, please wait to do this until after this issue is resolved. See → Uploading to usegalaxy.org - #8 by jennaj

I’ll try to address your questions :slight_smile:

  1. Loading from an external drive through to your computer then up to a website.

    Try this with smaller files first as a test. Sometimes people have slower data transfer rates from external drives up to any website, not just Galaxy. You can compare with something like Google Drive. The transfer to Galaxy will be about the same as any other website.

  2. Data security at public Galaxy servers

    Remember that public Galaxy servers are not appropriate for certain classes of data. Protected human clinical data is one example, but it could be processed in certain Galaxy deployments like this one → AnVIL - Galaxy Community Hub.

  3. Data security for other cloud providers.

    Dropbox is one example, and AWS S3 buckets are another. Whether to keep you data in the cloud is beyond the advice I can give here – I can only say that transfers between two cloud environments will always be “faster” than data coming from a laptop’s external drive.

  4. Upload speeds.

    It sounds like you are working from a consumer internet connection. ~ 8 Mbps will be pretty slow to transfer ~ 40 GB of data (12 hours per file!). And that isn’t counting the speed from the external drive, or any throttling your internet provider might apply once they “see” the large transfers. I think you’ll need to find a faster connection to get those large files anywhere else but you could contact them and ask.

Hope this helps! :scientist: