Issues with data format and running HiSat2

elijah0lux · July 27, 2022, 7:32pm

Hi,
Im currently in the middle of my uni project and I’m trying to follow an RNA-Seq HiSat pipeline to analyse reads obtained from NCBI Gene Expression Omnibus (GEO). I’ve got the data, and tried to put it through HiSat. Didn’t work and came back with error messages, realised that I probably should have put it through FastQC/Trimmomatic first.
Tried however, it didn’t accept or recognise any fastq files, couldn’t understand why til I looked at the data files. It seems that I had “fastq.gzReadsPerGene.out.tab.gz” instead of normal fastq files. Apparently from what i’ve been told, I’ve got from GEO are mixed messages. The first part “fastq.gz” would represent raw NGS reads, which should either be in fastqsanger format or can be converted to fastqsanger.
If FastQC doesn’t run and won’t accept the files then it is unlikely they are in a fastq format. The second part “ReadsPerGene.out.tab.gz” suggests these are aligned sequences with gene counts.

So i’ve now been trying to determine whether these files are in fastq format or not. Tried to convert the files using FastQ Groomer, didn’t accept the files. FastQC didn’t accept them so I’m assuming they’re not in fastq format. Tried running through HiSat again and I just encounter the same errors as before. I’ve looked almost everywhere to see if I can fix this problem and I’ve emailed again to ask what to do as I’m relatively new to using this tool. Does anyone know how I resolve this issue?

This was all done on my uni’s galaxy server.

gbbio · July 28, 2022, 8:30am

The filename suggest it is something else then a fastq file to me. How did you got the files? Would it be possible for you to open the files on your local computer? So download them first, unzip (gunzip) them and just open it with a text editor?

elijah0lux · July 28, 2022, 3:25pm

So initially, i downloaded load each of the files individually from GEO rather than getting the accession list and putting that into Galaxy. Okay, I will give that a try. What should I do once I open it with the text editor?

elijah0lux · July 29, 2022, 1:28pm

Okay ive tried to open it on my computer. The files aren’t zipped. So I tried using the 7-Zip option and extracting the file. Tried to open it with text editor but I get an error saying windows cannot open this type of file.

gbbio · July 29, 2022, 1:46pm

I dont know what you exactly did, you may did something wrong because you can basicly even open zipfiles in a text editor if you want. fastq files are just text files with a certain format, so if you would open the file in a text editor and see that it has a fastq format you know it is a fastq file. Do you have a link to the page where you are downloading from? Or an accession.

elijah0lux · July 30, 2022, 11:40am

Link to GEO where I got the data GEO Accession viewer
I was specifically using rhe RNA-Seq runs only ,Accession No:GSE152547

elijah0lux · July 30, 2022, 3:02pm

Maybe I downloaded the files incorrectly I’m not sure. I’ve cleared my history and downloaded them again this time using the SRA (Sequence Read Archive) In GEO. Its given me my pair-end data (all the data is here 324 items), single-end data, other data and the faster q dump log. So far everything looks fine. I try to run it through fastqc and it gives me an error message saying a certain read in the data collection doesn’t exist as its been deleted. Little confused seeing as I just finished downloading the data this morning. Check the file in question, both the forward and reverse reads are there, no problem. I click on the forward read to view the file and an error message comes up informing that ‘this file does not exist as it has been purged’.

I haven’t purged or deleted any of this data so im confused as to why im facing problems now?

jennaj · September 21, 2022, 10:43pm

This sounds similar to another problem I helped with about a month ago.

Try this:

Create a new history and give it a distinct name
Download the data from SRA into that history
Avoid using tools like this one Export datasets to remote files source while you are still processing any data in that history.

I’m going to close this topic out. If the problem can be reproduced again following the advice above, please ask a new question and include a shared history link for context. The data at that point would all be public anyway, and it would help us when reviewing for potential bugs.

Thanks!

Topic		Replies	Views
How to convert fastq.gz to fastqsanger.gz usegalaxy.org support upload , fastqsanger , epigenetics , quality-control	4	1127	August 24, 2023
Dataset collection not recognized as possible input mapping , fastqsanger	3	340	April 18, 2023
Convert from BAM to fastq usegalaxy.org support fastqsanger , quality-control	18	2992	May 29, 2019
Converting Fastq.gz usegalaxy.org support server-admin	11	5572	July 13, 2020
Galaxy tutorial "Differential abundance testing of small RNAs" corrupted usegalaxy.eu support server-admin , workflow , tool-dev	7	816	July 13, 2020

Issues with data format and running HiSat2

Related topics