Sequence identification from pool of NGS sequence reads

harriet_lahiff · August 23, 2023, 11:25am

Hello!

We have a pool of approximately 600,000 150bp (+/- a few) reads generated from NGS.

From this pool, we want to identify the number of times a certain sequence arises. Is there a way to do this using Galaxy?

harriet_lahiff · August 23, 2023, 11:25am

Should add, we have approximately 12 reference sequences we want to identify in the pool for their prevalence.

gallardoalba · August 23, 2023, 2:55pm

Hi @harriet_lahiff,
one option is to use Seqkit locate.

Regards

harriet_lahiff · September 7, 2023, 10:32am

Hello,

Thanks for your response.

Our data files are CSV where each sequencing result is listed as a row - do you have any suggestions how to use this format, as that tool requires FASTA.GZ files?

Many thanks

jennaj · September 11, 2023, 5:41pm

Hi @harriet_lahiff

The .csv datatype means a “comma separated values” type of data file.

Try this:

Convert “comma separated” to “tabular separated” format (some tools will do this directly at runtime)
Convert tabular data to fastq format

Tutorial → NGS data logistics

Topic		Replies	Views
Matching NGS sequence reads to a DNA library database plus barcode counts usegalaxy.org support workflow	3	276	August 23, 2023
NGS analysis for mRNA display	1	248	July 2, 2023
DESEQ2 analysis with galaxy transcriptomics , featurecounts , rna_star	9	1323	February 8, 2023
Transcriptomics troubleshooting transcriptomics , rna-seq , goseq	2	578	July 26, 2023
Extract subsequence from FASTA/Q file usegalaxy.eu support fasta-manipulation	3	398	August 21, 2023

Sequence identification from pool of NGS sequence reads

Related topics