Hello!
We have a pool of approximately 600,000 150bp (+/- a few) reads generated from NGS.
From this pool, we want to identify the number of times a certain sequence arises. Is there a way to do this using Galaxy?
Hello!
We have a pool of approximately 600,000 150bp (+/- a few) reads generated from NGS.
From this pool, we want to identify the number of times a certain sequence arises. Is there a way to do this using Galaxy?
Should add, we have approximately 12 reference sequences we want to identify in the pool for their prevalence.
Hi @harriet_lahiff,
one option is to use Seqkit locate.
Regards
Hello,
Thanks for your response.
Our data files are CSV where each sequencing result is listed as a row - do you have any suggestions how to use this format, as that tool requires FASTA.GZ files?
Many thanks
The .csv datatype means a “comma separated values” type of data file.
Try this:
Tutorial → NGS data logistics