Trim Galore error due to problematic fastq format

Hello,
Please I have a question regarding ChIP-Seq data analysis using Galaxy platform! During the trimming step (using Trim Galore) for ChIP-Seq data, I got error message:

Fatal error: Exit code 1 ()
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> input_1.fastq.gz <<)

Found perfect matches for the following adapter sequences:
Adapter type	Count	Sequence	Sequences analysed	Percentage
Illumina	207	AGATCGGAAGAGC	1000000	0.02
smallRNA	3	TGGAATTCTCGG	1000000	0.00
Nextera	2	CTGTCTCTTATA	1000000	0.00
Using Illumina adapter for trimming (count: 207). Second best hit was smallRNA (count: 3)


gzip: stdout: Broken pipe
Writing report to './input_1.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: input_1.fastq.gz
Trimming mode: single-end
Trim Galore version: 0.4.3
Cutadapt version: 1.13
Quality Phred score cutoff: 15
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 3 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Output file(s) will be GZIP compressed

Writing final adapter and quality trimmed output to input_1_trimmed.fq.gz


  >>> Now performing quality (cutoff 15) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file input_1.fastq.gz <<< 
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
This is cutadapt 1.13 with Python 3.6.1
Command line parameters: -f fastq -e 0.1 -q 15 -O 3 -a AGATCGGAAGAGC input_1.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
cutadapt: error: Line 1 in FASTQ file is expected to start with '@', but found '\n'


Cutadapt terminated with exit signal: '256'.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...


gzip: stdout: Broken pipe

please if you can help me solve it!
Thank you very much!
Sham

1 Like

Hi!

If you look closely in your error message you’ll see the following line:

So something is wrong with your FASTQ file and there’s either a blank line inserted, or something else is strange. Hope this helps!

1 Like

Hi hxr,
Thank you for your reply, yes I noticed this but the FASTQ file I am using starts like this: @SRR6128868.1.1 1 length=76 ! as normal!

1 Like

HI @shamjdeed – The problematic formatting may be internal, not at the start. The error is a bit misleading.

Line 1 in FASTQ file

Could be interpreted as:

Line 1 in FASTQ record

This can be due to:

  • Truncated dataset – usually occurs because Upload was not complete, or the file was truncated from an earlier data transfer upstream from Galaxy.

  • Manipulated dataset – most often seen when multiple fastq datasets were concatenated together, and one or more contained an empty blank line at the end which ends up somewhere in the middle of the final dataset. This could happen in Galaxy or upstream from Galaxy.

Tools that can help find the problem:

  • Fastq Groomer – this tool has enhanced data checks.

    • Use all default settings.
    • Data that is already in fastqsanger format (and assigned that datatype) will remain unchanged in the new output if there are no problems and could be permanently deleted once the QA is done/passes to reduce quota usage/duplicate data.
    • If there are some formatting problems, the first malformed fastq record will be reported in the error message. Be aware that there could be more problems, but this tool can give some clues about where/what an example problem is, so you can find all occurrences like it.
  • Select last lines of a dataset (tail) – this is how to review the ends of files to see if they are truncated or contain empty blank trailing lines.

    • The last 10 lines (the default) should be sufficient.
  • Advanced methods – to examine (any) data in more depth, try using tools in these groups:

    • Text Manipulation
    • Filter and Sort
    • Join, Subtract and Group

Reference: FAQ for Fastq formatting

Hope that helps!

1 Like

Hi Jennifer,
Thank you very much! it really helped me!

1 Like

Hello @shamjdeed could you please inform how did you solve the problem? I’m having the same issue while running the RNAseq workflow and still haven’t figure it out.
Best,

Hi @SciJrb

If you are getting errors related to fastq format, any of the formatting problems listed above could be a factor. This means each solution is distinct based on what exactly is wrong.

Go through each item in the original reply to check your data format to try to isolate the problem. Truncated data is the most common reason for problems. The file could be truncated from some earlier data transfer or introduced during Upload to Galaxy.

If you cannot figure out how to fix what is going wrong, post back your tests + results. At a minimum run the Fasta Groomer and Select last lines tools.

Hello @jennaj, thanks a lot for the quick reply. here is the error that I get from runing Trim galore from my files. The thing is that I don’t know if my files could be corrupted when I downloaded them from the sequencing server or did something happen uploading via FTP to Galaxy.
I am now trying to run the Fastq Groomer to check if there could be any issue. The thing is that I have very little experience manipulating these files and I don’t want to mess around with them… :frowning:

Thanks a lot again!


Fatal error: Exit code 1 ()
Path to Cutadapt set as: ‘cutadapt’ (default)
Cutadapt seems to be working fine (tested command ‘cutadapt --version’)
Writing report to ‘./input_1.fastq_trimming_report.txt’

SUMMARISING RUN PARAMETERS

Input filename: input_1.fastq
Trimming mode: single-end
Trim Galore version: 0.4.3
Cutadapt version: 1.18
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: ‘AGATCGGAAGAGC’ (Illumina TruSeq, Sanger iPCR; user defined)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length before a sequence gets removed: 20 bp

Writing final adapter and quality trimmed output to input_1_trimmed.fq

Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: ‘AGATCGGAAGAGC’ from file input_1.fastq <<<
This is cutadapt 1.18 with Python 3.6.6
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC input_1.fastq
Processing reads on 1 core in single-end mode …
Traceback (most recent call last):
File “/srv/ecotoxdb/galaxies/galaxy2/galaxy/database/dependencies/_conda/envs/__trim-galore@0.4.3/bin/cutadapt”, line 11, in
load_entry_point(‘cutadapt==1.18’, ‘console_scripts’, ‘cutadapt’)()
File “/srv/ecotoxdb/galaxies/galaxy2/galaxy/database/dependencies/_conda/envs/__trim-galore@0.4.3/lib/python3.6/site-packages/cutadapt/main.py”, line 798, in main
stats = runner.run()
File “/srv/ecotoxdb/galaxies/galaxy2/galaxy/database/dependencies/_conda/envs/__trim-galore@0.4.3/lib/python3.6/site-packages/cutadapt/pipeline.py”, line 188, in run
(n, total1_bp, total2_bp) = self.process_reads()
File “/srv/ecotoxdb/galaxies/galaxy2/galaxy/database/dependencies/_conda/envs/__trim-galore@0.4.3/lib/python3.6/site-packages/cutadapt/pipeline.py”, line 230, in process_reads
for read in self._reader:
File “src/cutadapt/_seqio.pyx”, line 136, in iter
File “/srv/ecotoxdb/galaxies/galaxy2/galaxy/database/dependencies/_conda/envs/__trim-galore@0.4.3/lib/python3.6/codecs.py”, line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte

Cutadapt terminated with exit signal: ‘256’.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong…