Deleting sequence identifier line

Isa · April 4, 2024, 2:11pm

Hello,

I have a FASTA file with the identifier lines between them. Thus it is like this:

>A00597:359:H53WCDSXC:3:1101:1127:23390/1
CCAGTACCCACTTAGAAAGAAATAAAAAAACAAATCAGACAACAAAGGCTTAATCTCAGCAGATCGTAACAACAAGGCTACTCTACTGCTTACAATACCCCGTTGTACATCTAAGTCGTATACAAATGATT
>A00597:359:H53WCDSXC:3:1101:1127:23390/2
CCAGTACCCACTTAGAAAGAAATAAAAAAACAAATCAGACAACAAAGGCTTAATCTCAGCAGATCGTAACAACAAGGCTACTCTACTGCTTACAATACCCCGTTGTACATCTAAGTCGTATACAAATGATT

I was wondering if it is possible to remove these identifier lines, and how i can do that. And I was wondering why i get two linet with the same sequence? I can see that one is labeled /1 and one /2 but what is the difference?
Hopefully anyone can help me. Thanks in advance!

Kind regards,
Isa

jennaj · April 9, 2024, 6:06pm

Hi @Isa

I’m going to show you how to navigate our tutorials to frame your questions a bit more, then answer…

GTN Tutorials homepage → Galaxy Training!
Start here to learn about reads and basic manipulations → Introduction to Galaxy Analyses
In the Intro listing, see → Hands-on: NGS data logistics / Introduction to Galaxy Analyses

Sequences in fasta format will need something on the the > title line. Minimally, an identifier and optionally a description.

Sequences in fastq format will also need something on the @ lines but the + lines are usually left blank.

The /1 and /2 is nomenclature for paired-end data. You will usually want to keep that intact since tools will use it.

Your examples looks like NGS reads in fastq format, not fasta, however I don’t see any lines for the quality scores. Maybe you know why that is (upstream manipulations?).

Please review that NGS tutorial, then ask more questions here if you get stuck. We would be interested in seeing 1) the original data and learning about 2) where it came from and 3) what you plan to do with the data.

Let’s start there

Topic		Replies	Views
Extract subsequence from FASTA/Q file usegalaxy.eu support fasta-manipulation	3	397	August 21, 2023
Removal of spaces from fasta file fasta-manipulation , custom-genome	4	2440	January 2, 2019
change fastq identifier usegalaxy.org support text-manipulation , qiime2	3	348	September 22, 2022
UniProt SignalP Predictions: How tobautomatically remove predicted signal equence from FASTA uniprot , fasta-manipulation , bed , text-manipulation	5	1245	December 17, 2018
Issue with Train Augustus tool usegalaxy.org support fasta-manipulation , troubleshooting	1	17	January 6, 2025

Deleting sequence identifier line

Related topics