Hello everyone, I’m trying to find if a sequence in my multifasta files is present in E. coli.
My multifastafiles looks like this :
16
GTGGAGTNNNNNNNGTCTGTC
17
GTGGAGTNNNNNNNTTTTCAT
ect…
It contains around 5000 sequences
I have tried to use mapping tools such as bowtie2, minimap2 but it always tells me that I have 0 sequences that can align on the genome of E. coli, but I know that the sequence “16” is a sequence found in E. coli. So I think that the “N” are the problem, but I don’t know how to overcome it? Is there some parameters that I should change ? Or am I using the wrong tool?
Thanks !
Kevin
–n-ceil
Sets a function governing the maximum number of ambiguous characters (usually Ns and/or .s) allowed in a read as a function of read length. For instance,
specifying -L,0,0.15 sets the N-ceiling function f to f(x) = 0 + 0.15 * x,
where x is the read length. Reads exceeding this ceiling are filtered out.
Default: L,0,0.15.
Maybe blast can be an alternative but I think here you would also need to change parameters becasue the read is so short.
I tried to play with this parameter, I put it at 0.4, but it stills doesn’t give me any match
I tried to blast my multifastafile with blastn, but I think that blastn is unable to take in charge the ambiguous characters.
Thanks for the help