Error in paired-end analysis

Dear All
I am getting an unexpected during pear end read merger fastq error showing:
An error occurred with this dataset:
format fastqsanger database.
However two samples are merged successfully.
Kindly suggest what to do.
Regards

Welcome, @mallamuneer123

There are several fastq “merge” tools. The message about the tool erroring and being successful at the same time is confusing for me to understand. Please try at least one rerun. If that fails, back up and do some QA on the reads.

If any tool is failing with paired-end fastq inputs, checking the format is a good place to start. Maybe the data is truncated or malformed in some way. That could be from a partial data upload to Galaxy, or some problem introduced before the data was in Galaxy.

A tool that will verify basic formatting and intact paired-end reads is: Fastq info

These tutorials can help with understanding fastq data (format/content/variants) along with example QA steps/tools. Also see domain/topic tutorials at that same site for protocol-specific QA steps.

Please run the QA steps in that same history as your error, and if you cannot solve the problem, we can try to help here. FAQ about where to find error logs and how to generate a shared history link for review → Troubleshooting errors

Hi again
After your suggestion I rerun the merge tool and got the same error.
then I checked the format and paired-ends reads by using Fastq info tools.
the results I got are as:
For read R1:
Fastq.info fasta = 12,715,268 sequences
Fastq.info qual = 25,430,536 lines
fastq scrap = 0 bytes
fasta scrap = 0 bytes
qual scrap = 0 lines
and For Read R2:
Fastq.info fasta = 12,363,913 sequences
Fastq.info qual = 24,727,826 lines
fastq scrap = 0 bytes
fasta scrap = 0 bytes
qual scrap = 0 lines
So this is the result, what to do next.
please suggest.
And my aim is to evaluate the microbial community structure and function, antibiotic resistance, and virulence determinants.
Please suggest
Regards

  1. Are these the original “raw” fastq files?
  2. Which tool are you exactly using to merge?
  3. If it are not the original fastq files which step do you do before merging?
  4. Does the merging tool end in a red or green color?
  5. If red, did you clicked the “bug” icon to see the error?

I notice that your fastqc files do not have the same amount of reads. Mostly it has the same amount of reads unless you do something with it.

1 Like

Hi Again
Yes, these are the original “raw” fastq files.
I used “Paired-End read merger”
The merging tool showed a red color
Yes, I clicked the “bug” icon and here is the error
Fatal error: Exit code 134 ()
1 Problem, number of reads does not match! n1 = 0 n2 = 84196
/mnt/pulsar/files/staging/6295728/command.sh: line 119: 2337290 Aborted (core dumped) pear -f “/mnt/pulsar/files/staging/6295728/inputs/dataset_13793713.dat” -r “/mnt/pulsar/files/staging/6295728/inputs/dataset_13781585.dat” --phred-base 33 --output pear --p-value 0.01 --min-overlap 10 --min-asm-length 50 --min-trim-length 1 --quality-theshold 0 --max-uncalled-base 1.0 --test-method 1 --empirical-freqs -j “${GALAXY_SLOTS:-8}” --score-method 2 --cap 40
Kindly suggest
Thanks in advance

So the problem is clear now.

Problem, number of reads does not match!

This is also confirmed by your previous post:

For read R1:
Fastq.info fasta = 12,715,268 sequences
Fastq.info qual = 25,430,536 lines
fastq scrap = 0 bytes
fasta scrap = 0 bytes
qual scrap = 0 lines
and For Read R2:
Fastq.info fasta = 12,363,913 sequences

The solution would be to give fastq files as input with the same amount of (matching) reads. Where did you got the files from? You could try to re-download and re-upload them.

1 Like

Thank you, gbbio
I have uploaded the file again
Now the file size is:

  1. R1 = 4.1GB and R2 = 4.0 GB
  2. R1 = 4.3 GB and R2 = 4.2 GB
  3. R1 = 3.9 GB and R2 = 3.9 GB
    So, looking at the same
    Moreover, Please suggest after merging the pair end what tool to use for taxonomic and functional analysis, ARG and pathogenic determinants.
    Thanks and Regards

Hello again
While I uploaded the data files
Ran the Paired-End read merger and got the same error.
Please suggest
I am stuck badly.
thanks and regards

Also, try this. Reloading the data was just one guess about why the reads are not all paired.

There are a few ways to filter out any forward or reverse reads that are missing a mate. After you are done, the Fastq info tool should report the same number of reads in the forward and reverse files. You won’t be able to do more until that is resolved.

  1. Run the data through a QA tool that sorts the output into intact pairs and singletons. Example: Trimmomatic. See the tool form for how to use it, and our QA tutorials for usage examples.

  2. Use tools that compare the read identifiers between files, and only retain output where both files contain the base read name. You might need to convert fastq → tabular, do tabular manipulations, then covert tabular → fastq at the end. See the tutorial below for the choices with examples.

  3. Use tools from the Seqtk tool group. These do all sorts of logical data manipulations. Converting to an interleaved format then back to individual forward + reverse format can get rid of stray unpaired singleton reads. If you do not understand what these terms mean, see the first QA tutorial I linked before. You can compare data visually to the examples to learn the concepts.

References:

  • Text manipulation tools: Data Manipulation Olympics
  • QA tools – some basic checks to make sure data has valid format and content are always recommend, even if you don’t plan on doing any trimming.

Hello again
As per your suggestions, I tried to run Trimmomatic tool, but it agin generated the error
showing a fatal error occurred.
Please have a look at my dataset.
thanks and regards

Hi again
I want to mention that I used Seqtk tools:
I ran seqtk_mergefa and it ran successfully.
Showing the results as:
[stk_mergefa] Unequal sequence length: 150 != 151 [stk_mergefa] Unequal sequence length: 150 != 120 [stk_mergefa] Unequal sequence length: 151 != 119 [stk_mergefa] Unequal sequence length: 151 != 150 [stk_mergefa] Unequal sequence length: 150 != 151

So, I want to know have I done it right.
Or I need to rerun some other tools.
thanks and Regards

Hi @mallamuneer123

Ok, thanks for trying that. This means that these are very likely not raw sequencing reads.

You might want to reconsider using this data if that matters to you. We won’t be able to help with odd problems that may show up later on due to data content modifications done upstream of Galaxy.

That means if PEAR errors for a different reason then your original error about missing mates (what we are specifically addressing), it could be for this same reason as Seqtk complained about: not using raw reads as detected (and expected) by the specific underlying tool. The kinds of errors you are getting would result when using these tools directly on the command line even outside of Galaxy.

If you want to try using this data anyway
These tools are more tolerant about content such as read length differences in paired datasets. Run the steps in order, or expect it not to work.

  1. FASTQ interlacer on paired end reads (on your two files)
  2. FASTQ de-interlacer on paired end reads (on output of step 1)
  3. Fastq info (on the two outputs of step 2)

Once you have proper pairs, then you can move forward with other tools and make a decision about whether this data is usable or not. Maybe visit the PEAR tool authors resources. The tool form has link outs to documentation, the development repository, and a publication reference at the bottom.

Hi again
I ran the tool FASTQ interlacer
But to no success.
As previously I got reply to upload the data as gzipped fastq files.
I uploaded successfully.

Getting data into Galaxy is just one step.

You also need to provide data in a standardized format that tools understand how to work with. This includes any tool-specific input requirements. Those are generally the same as expected when using tools directly outside of Galaxy on the command-line.

You could try running on the command-line. Maybe the tools will output more error logs that are meaningful to you.

Hope this works out for you!

Thanks jennaj
For the suggestions but unfortunately I am not to comfortable with the command line.
That is why I tried using Galaxy platform.
I therefore request to suggest more.
Thanks and Regards.

OK, let’s try again. I’d like to get some result or some concrete answer that the data is not usable.

Please generate shared history link(s) and post those back, and I’ll take a direct look.

  1. Leave all of the original data undeleted or we won’t be able to run tests or help more. If you need to reload the data, do that first, then rerun all the other steps again.
  2. Leave the manipulations you have done undeleted. If you already deleted those, recreate the work (reruns are useful anyway).
  3. If the work was done in multiple histories, share back links to multiple histories.
  4. How to generate the links: Sharing your History

Hi again
I think the reads are now merged.
After uploading the gzipped Fastaq files, I ran the Paired-End Read Merger, and it worked successfully.
Please find the attached link to access the history.
https://usegalaxy.org.au/u/muneer/h/myanalysis
Kindly suggest what to do next.
I am looking for taxonomic and functional analysis, ARG, and pathogenic determinants.

Thanks and Regards

Hi again
Hope you are doing well
Kindly have a look at my history and suggest how to proceed.
Thanks and Regards.