Merging FASTA files

Goncalo_Pinheiro · July 5, 2022, 4:37pm

Hello everyone. This might be a very basic question, but I am only starting to analyze my transcriptomic data for the first time

My sample sequencing was run on 4 lanes, so I have 4 FASTA files per sample. How do I merge them together, to get a complete summary of each sample sequencing?

Thank you for your help

jennaj · July 5, 2022, 11:23pm

Hi @Goncalo_Pinheiro

A single sample run on multiple lanes can be combined with the Concatenate tool. This stacks the reads top to bottom.

For QA or summary information like quality metrics, the protocol varies by the analysis domain.

Tutorials: https://training.galaxyproject.org/
Start here: NGS data logistics

Goncalo_Pinheiro · July 14, 2022, 9:59am

Hello jennaj

First, I need to make a small correction to what I had said. I have each sample separated into 4 lanes, each producing a separate FASTQ (not FASTA) file.

I have tried to use the Concatenate tool, but it seems that it won’t perform the function I want it to perform (add the reads of each FASTQ file into just one). I have converted the FASTQ files into FASTA files and used the Merge FASTA tool, which did (at least I think it did) what I require, but now I do not know how to convert it back to FASTQ, nor how to obtain a QUAL file for the merged FASTA (which seems to be what I am missing).

Does anyone have a clue how I can solve this issue? Is there a tool that merges FASTQ files, and I just missed it?

Thank you for the help

jennaj · July 22, 2022, 9:24pm

These should definitely work. Try uploading the data and let Galaxy guess the datatype. It should be fastqsanger or fastqsanger.gz for current NGS sequencing methods. You might want to uncompress the read data once uploaded (pencil icon → Convert).

Concatenate per sample into one file each. Keep samples distinct. If the data is paired-end interleaved, you can leave it interleaved for some tools, and others will expect one file per end. Then if you want to batch process, put the data into a collection.

The tools to use are one of these (examples at UseGalaxy.org, but most public Galaxy servers will host the same by default):

This version allows you to choose the order: Galaxy
This version uses the existing order of the datasets in the history: Galaxy

Don’t use this kind of method – you are losing information (quality scores) plus you don’t need the extra functionality. All the sequences should already have distinct identifier names when in the fastq format.

Topic		Replies	Views
Tool for merging 2x single-read illumina sequencing files (fastq) into one? usegalaxy.org support text-manipulation , macs2 , fastqsanger , epigenetics	7	666	November 5, 2021
How to combine/merge fastq files? -- Answer/tool: Concatenate datasets tail-to-head (cat) text-manipulation	2	3814	November 19, 2020
Concatenate multiple datasets tool-help , cat_multi_datasets	1	277	May 6, 2024
Can I merge two fastq into one in galaxy? usegalaxy.org support text-manipulation	1	489	November 27, 2023
Illumina fastq (pair-end) file conversion to fasta file usegalaxy.be support troubleshooting	2	40	March 7, 2025

Merging FASTA files

Related topics