Help : looking for an alignment tool that tolerates ambiguous bases

Hello everyone, I’m trying to find if a sequence in my multifasta files is present in E. coli.
My multifastafiles looks like this :

16
GTGGAGTNNNNNNNGTCTGTC
17
GTGGAGTNNNNNNNTTTTCAT

ect…
It contains around 5000 sequences
I have tried to use mapping tools such as bowtie2, minimap2 but it always tells me that I have 0 sequences that can align on the genome of E. coli, but I know that the sequence “16” is a sequence found in E. coli. So I think that the “N” are the problem, but I don’t know how to overcome it? Is there some parameters that I should change ? Or am I using the wrong tool?
Thanks !
Kevin

Is your example real data taken from the fasta file? If so, this is not the correct fasta format. You are missing the “>” character.

>16
GTGGAGTNNNNNNNGTCTGTC
>17
GTGGAGTNNNNNNNTTTTCAT

Thanks for the answer, in my multifasta files I have the “>” character :slight_smile:

Sorry, think I mis understood your question. :slightly_smiling_face:

You could maybe tweak the --n-ceil parameter.

–n-ceil
Sets a function governing the maximum number of ambiguous characters (usually
Ns and/or .s) allowed in a read as a function of read length. For instance,
specifying -L,0,0.15 sets the N-ceiling function f to f(x) = 0 + 0.15 * x,
where x is the read length. Reads exceeding this ceiling are filtered out.
Default: L,0,0.15.

Maybe blast can be an alternative but I think here you would also need to change parameters becasue the read is so short.

I tried to play with this parameter, I put it at 0.4, but it stills doesn’t give me any match
I tried to blast my multifastafile with blastn, but I think that blastn is unable to take in charge the ambiguous characters.
Thanks for the help

Not sure if it is allowed to post here but this tool can also be an option but it is not on galaxy. search_oligodb command

Thank you i’m going to check that !