Hello,
I oploaded a pooled CRISPR/Cas9 screen data (fastq.gz) to galaxy and used the barcode splitter to demultiplex the data. This created separate fastqsanger files. As I need the files in fastq or fastq.gz format, I tried converting the format by editing the attributes and changing the datatype format. I am not sure if this was the proper way to convert fastqsanger to fastq.gz or fastq. When I download the converted files, the fastq.gz files seem strange, as I can not open the archive/ extract them. There are error messages saying that it is not an archive file.
I would be grateful for advice on how to concert fastqsanger to fastq or fastq.gz files and on what could be wrong.
Many thanks, Janine
Hi Janine,
there are a couple of different issues here:
-
fastqsanger and fastq are the same file format and, outside of Galaxy, there is no difference between them. (Inside Galaxy, changing the datatype from fastqsanger to fastq or vice versa will affect which tools can use the dataset.) For downloading you cannot do anything wrong with that.
-
As it says on the
Change datatype
tab:
This will change the datatype of the existing dataset but not modify its contents. Use this if Galaxy has incorrectly guessed the type of your dataset.
So using this functionality to compress fastq/fastqsanger to fastq.gz is definitely wrong. It will just make Galaxy consider the data to be compressed, but that doesn’t make it compressed.
The correct way is via the neighbouring Convert tab. That offers an option to decompress compressed fastq. The other way round is not offered though. -
Is the Barcode splitter really producing fastq, when you feed it fastq.gz datasets? Haven’t tried this myself, but that would be close to a bug in the tool.
-
Which tool did you use to try and open the downloaded file on Windows? Windows does not support gzip format out of the box, or does it these days?
Thank you very much for your answer and clarifying some things for me. I reran the barcode splitter. My input file was a fastq.gz file, which was recognized by galagy as a fastqsanger.gz file. All the generated output files are fastqsanger format.
I tried opening the downloaded files on windows with with 7-zip file and winRar Archiver.
It sounds like you are using “compressed file” tools right now – those won’t work. The downloaded data is not compressed.
.fastqsanger
is an uncompressed plain text formatting. Any plain text editor should be able to read the files. Those may not recognize the format by default since the file extension doesn’t end with a .txt
– however, that is what this data is (at the most basic level). You could make a copy of one of the files, add on the .txt
extension, then test out which tools recognize it, to learn which you want to use/configure to read all of the others.
That said, fastq data is usually very large and reviewing it directly in a text editor is rarely needed. If the goal is to use line-command utilities/tools to work with the data (outside of Galaxy), use whatever options specify that the input is in fastq
format for these data (not fastq.gz
).