Trim Galore error due to problematic fastq format

Hello,
Please I have a question regarding ChIP-Seq data analysis using Galaxy platform! During the trimming step (using Trim Galore) for ChIP-Seq data, I got error message:

Fatal error: Exit code 1 ()
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> input_1.fastq.gz <<)

Found perfect matches for the following adapter sequences:
Adapter type	Count	Sequence	Sequences analysed	Percentage
Illumina	207	AGATCGGAAGAGC	1000000	0.02
smallRNA	3	TGGAATTCTCGG	1000000	0.00
Nextera	2	CTGTCTCTTATA	1000000	0.00
Using Illumina adapter for trimming (count: 207). Second best hit was smallRNA (count: 3)


gzip: stdout: Broken pipe
Writing report to './input_1.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: input_1.fastq.gz
Trimming mode: single-end
Trim Galore version: 0.4.3
Cutadapt version: 1.13
Quality Phred score cutoff: 15
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 3 bp
Minimum required sequence length before a sequence gets removed: 20 bp
Output file(s) will be GZIP compressed

Writing final adapter and quality trimmed output to input_1_trimmed.fq.gz


  >>> Now performing quality (cutoff 15) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file input_1.fastq.gz <<< 
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
50000000 sequences processed
60000000 sequences processed
70000000 sequences processed
This is cutadapt 1.13 with Python 3.6.1
Command line parameters: -f fastq -e 0.1 -q 15 -O 3 -a AGATCGGAAGAGC input_1.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
cutadapt: error: Line 1 in FASTQ file is expected to start with '@', but found '\n'


Cutadapt terminated with exit signal: '256'.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...


gzip: stdout: Broken pipe

please if you can help me solve it!
Thank you very much!
Sham

1 Like

Hi!

If you look closely in your error message you’ll see the following line:

So something is wrong with your FASTQ file and there’s either a blank line inserted, or something else is strange. Hope this helps!

1 Like

Hi hxr,
Thank you for your reply, yes I noticed this but the FASTQ file I am using starts like this: @SRR6128868.1.1 1 length=76 ! as normal!

1 Like

HI @shamjdeed – The problematic formatting may be internal, not at the start. The error is a bit misleading.

Line 1 in FASTQ file

Could be interpreted as:

Line 1 in FASTQ record

This can be due to:

  • Truncated dataset – usually occurs because Upload was not complete, or the file was truncated from an earlier data transfer upstream from Galaxy.

  • Manipulated dataset – most often seen when multiple fastq datasets were concatenated together, and one or more contained an empty blank line at the end which ends up somewhere in the middle of the final dataset. This could happen in Galaxy or upstream from Galaxy.

Tools that can help find the problem:

  • Fastq Groomer – this tool has enhanced data checks.

    • Use all default settings.
    • Data that is already in fastqsanger format (and assigned that datatype) will remain unchanged in the new output if there are no problems and could be permanently deleted once the QA is done/passes to reduce quota usage/duplicate data.
    • If there are some formatting problems, the first malformed fastq record will be reported in the error message. Be aware that there could be more problems, but this tool can give some clues about where/what an example problem is, so you can find all occurrences like it.
  • Select last lines of a dataset (tail) – this is how to review the ends of files to see if they are truncated or contain empty blank trailing lines.

    • The last 10 lines (the default) should be sufficient.
  • Advanced methods – to examine (any) data in more depth, try using tools in these groups:

    • Text Manipulation
    • Filter and Sort
    • Join, Subtract and Group

Reference: FAQ for Fastq formatting

Hope that helps!

1 Like

Hi Jennifer,
Thank you very much! it really helped me!

1 Like