Help : looking for an alignment tool that tolerates ambiguous bases

totoyoto · July 28, 2021, 7:52am

Hello everyone, I’m trying to find if a sequence in my multifasta files is present in E. coli.
My multifastafiles looks like this :

16
GTGGAGTNNNNNNNGTCTGTC
17
GTGGAGTNNNNNNNTTTTCAT

ect…
It contains around 5000 sequences
I have tried to use mapping tools such as bowtie2, minimap2 but it always tells me that I have 0 sequences that can align on the genome of E. coli, but I know that the sequence “16” is a sequence found in E. coli. So I think that the “N” are the problem, but I don’t know how to overcome it? Is there some parameters that I should change ? Or am I using the wrong tool?
Thanks !
Kevin

gbbio · July 28, 2021, 9:41am

Is your example real data taken from the fasta file? If so, this is not the correct fasta format. You are missing the “>” character.

>16
GTGGAGTNNNNNNNGTCTGTC
>17
GTGGAGTNNNNNNNTTTTCAT

totoyoto · July 28, 2021, 9:43am

Thanks for the answer, in my multifasta files I have the “>” character

gbbio · July 28, 2021, 10:02am

Sorry, think I mis understood your question.

You could maybe tweak the --n-ceil parameter.

–n-ceil
Sets a function governing the maximum number of ambiguous characters (usually
Ns and/or .s) allowed in a read as a function of read length. For instance,
specifying -L,0,0.15 sets the N-ceiling function f to f(x) = 0 + 0.15 * x,
where x is the read length. Reads exceeding this ceiling are filtered out.
Default: L,0,0.15.

Maybe blast can be an alternative but I think here you would also need to change parameters becasue the read is so short.

totoyoto · July 28, 2021, 10:20am

I tried to play with this parameter, I put it at 0.4, but it stills doesn’t give me any match
I tried to blast my multifastafile with blastn, but I think that blastn is unable to take in charge the ambiguous characters.
Thanks for the help

gbbio · July 28, 2021, 10:51am

Not sure if it is allowed to post here but this tool can also be an option but it is not on galaxy. search_oligodb command

totoyoto · July 28, 2021, 11:09am

Thank you i’m going to check that !

Topic		Replies	Views
Bowtie2 filtering reads mapping , blast , igv	1	590	September 28, 2023
Protein sequences alignment usegalaxy.org support mapping , proteomics	5	1408	July 15, 2021
First time user - Genome comparison usegalaxy.org support gtn-tutorial , dropbox	2	298	October 11, 2023
Searching one large sequence for a smaller sequence usegalaxy.org support troubleshooting , mapping , blast , ncbi_blastn_wrapper	1	12	September 30, 2024
Getting errors for Salmon usegalaxy.org.au support transcriptomics , salmon	3	396	October 18, 2023

Help : looking for an alignment tool that tolerates ambiguous bases

Related topics