Hi @M.r.t
Thanks for sharing the histories! Very helpful.
It looks like the reads from the pair didn’t load completely from the SRA at the UseGalaxy.eu server. This then lead to the downstream result discrepancy when mapping. Some mapping tools may have trapped this but I wouldn’t count on that – always verify the read content yourself. Even if trapped, the error messages can be odd and unclear, but a dedicated QA tool will usually report something actionable.
So, I was able to discover this by running some QA on the reads. The jobs failed, then I reviewed the job details view for the job using the i-icon. On those views, I saw these messages.
Then, if I used the bug-icon, the error message could be interpreted by our new Galaxy AI assistant! It interpreted this really well! It is new so I wouldn’t expect it to be perfect yet for every error plus so much depends on how the original author trapped error cases. But anything unclear you can always ask about here still. And anything really odd that seems to be a server issue, you can go ahead and submit the bug report and the server administrators will see it.
Expand to read the text from the gxy chat bot directly
Hello jenj,
The error message you’re encountering is from a tool known as FastQC, which is commonly used for quality checking of sequencing data in FASTQ format. Here’s a simplified breakdown of the error you are seeing:
Error Explanation:
- Failed to process file
ERR6068440_2_fastq_gz.gz
: This indicates that the tool encountered a problem when trying to process this specific FASTQ file. - SequenceFormatException: This error suggests an issue with the format of the sequence data within your FASTQ file.
- Midline ‘AATGCTGTCTCTC<TTCTTTCA9:8186)CCTTTGGTCCTGTAGGTCACAACTGTGTTTAT’ didn’t start with ‘+’ at 39603447: In a FASTQ file, each sequence entry consists of four lines:
- A sequence identifier line beginning with ‘@’
- The raw sequence letters
- A separator line beginning with ‘+’
- A quality score lineThe error indicates that at byte position 39,603,447 (or somewhere in the file), a line expected to start with a ‘+’ doesn’t comply with this format, which is required as the third line for each sequence entry.
Suggested Solution:
- Check File Integrity:
- Verify if the file got corrupted during download or transfer. You might want to redownload or decompress the file again if it was compressed, then re-upload it to Galaxy.
- Examine the Problematic Line:
- If you are familiar with command-line tools, you can inspect the file using a text editor or tools like
less
orhead
to navigate to the specified position and check if there is an unexpected character or misalignment in the file.
- Re-run the Tool:
- Once you’ve verified or corrected any issues with the file, try running the FastQC tool in Galaxy again.
If these steps don’t resolve the issue, the file might require sequence-specific investigations to ensure all entries conform to the FASTQ standard. Should you need further assistance, consulting with your data provider or a colleague familiar with the sequence data format might be beneficial.
I hope this helps guide you towards resolving the issue. If you have more questions, feel free to ask!
What to do
Whenever data is newly loaded up to Galaxy, run some simple checks to make sure the transfer happened correctly. That might just be inspecting the data to confirm the datatype but can also involve dedicated content checking tools. Fastq reads are sensitive to what is in the inside, too, the content, so assessing that is usually a good idea, along with manipulations like trimming.
More about Upload → Getting Data into Galaxy
Then, more about QA/QC is covered in this prior topic. The workflow at the end here would likely be a good fit for you. Getting the reads with the dedicated SRA read fetching will also add in some stability versus using free URLs, too, since that is handled a bit differently by the SRA data servers. That will also avoid potential content loss with multiple data transfer steps – example: from the cloud somewhere down to your computer, back up into a cloud resource. Cloud to cloud is one less hop.
I hope this helps! We can follow up more about any questions you have, here or in the private chat.
XRef → Search results for 'uk.ac.babraham.FastQC.Sequence.SequenceFormatException' - Galaxy Community Help