I’m trying to assemble publicly-available whole-genome SRA files for Staphylococcus aureus bacteria. Upon deposition, NCBI interlaces the two submitted “R1” and “R2” FASTQ files into one. When I download the interlaced file from NCBI and upload to Galaxy.org it recognizes the FASTQ files as “txt” instead of something usable like “fastqsanger.gz”. Any idea how I can modify the downloaded interlaced file so that Galaxy can recognize the proper file type? I don’t have a cloud delivery service so I can’t just download the input files…
Interlaced reads from SRR10591328.fastq.gz (and the extracted file) read as txt while original submitted (not interlaced) runs read as the correct fastqsanger.gz file type
I started a test run in this history using the same tool with your accession to see what happens. This is a shared history, so you can click on that link to see the how it worked. → https://usegalaxy.org/u/jen-galaxyproject/h/test-srr10591305
I see a “list paired collection” result, not interleaved. But you can change how that data is organized: interleaved, separate files, plus various collection folder shapes.
Please give all of this help a review, and if I am misunderstanding, please explain a bit more. Then, if you need to change how your data is organized, and are not sure how to do that in Galaxy, we can help with that too. We’d need to know how the data is organized now, and what tool you plan to use next.
Let’s start there and please let us know if you get this working!
Why change the “shape” of data? Galaxy hosts tools written by many different authors. Those authors had different data expectations! You can quickly change how your data is organized at any time. Extract those steps into a workflow for easy reuse.