Running large-scale jobs with "Faster Download and Extract Reads in FASTQ "

Hello everyone,
I have set up the tool “Faster Download and Extract Reads in FASTQ” to download a large set (~100) of SRA accessions (SRR-) all in a single run, however, at the time of writing this, the job has been running for nearly 4 days. Is this typical of a job of this type? Or would it be more time effective to break this job down into smaller batches?

1 Like

Hi @krpukacz

Apologies for the delayed reply.

For the job taking a few days to process, that sounds normal. What you describe is a batch of 100 distinct data retrieval queries to SRA. Galaxy and external data providers (SRA, UCSC, etc) all have limits on the number of concurrent queries for practical and technical reasons. Allowing the work to complete is the fastest solution.

If any accessions fail, try just those again. SRA is moving data around in the cloud infrastructure. A small fraction of accessions are impacted at any particular time, and that changes, and will continue to change until they are done. If needed, go to the NCBI website, search by accession, capture the data URL, and paste that into the Upload tool. Or you can try EBI SRA. You can always manipulate your data collections to include all of your data together once you have it all in Galaxy.

Thanks!