Exploring Mothur hits to a SILVA database

Hi Jenna,
I have faced the same issue mentioned in this post, I have tried your suggestion as per this post but alignment is not processing, showing error. I have also uncompressed reference fasta file, but it didnt worked. I am sharing my history, Please suggest me where I am getting wrong.

Thanks,
Shweta

1 Like

Hi @Shweta.203

Great that you got your reads to assemble! For this step, you are not getting any hits, the same as the other person was getting, so all of that advice applies.

Maybe run a few tests to see what is going on?


  1. Using this index

Downsample to a smaller input file and run that through this tool. I would suggest something really small so it will run quick. You can use this tool → Sub-sample sequences files

You should also check the box to output the extra logs – you are doing detective work, so those might provide more clues! I would personally output all of the extra outputs.

  1. Using the server index

Run your test sized data against the server index, and turn the extra log outputs on too.

You could also run your entire query dataset against the server index. This might be your solution if it works. Unless you have a compelling reason to use the custom index?

  1. Try at a different Galaxy server

This is a technical cross check. I can’t tell if the reason for your failure is for some resource reason or not, it doesn’t look to be but you could confirm that. The tests above should flush part of that that out but not completely. If this is about server resources – you could try at UseGalaxy.eu.


Then for the data part itself:

Super short sequences: I noticed that you have sequences as short as three bases. Your k-mer size is 8 – that means it will take a sequence with at least 8 bases of perfect identity to find a hit. You should probably filter your sequences for some minimum length, if only to reduce the size for better runtime performance. Any sequence shorter than that will be ignored by the tool! I think 25 bases is the usual filter but something like 20 might be Ok. You could test filtered datasets with the tests above to see what happens. Do you get anything meaningful output with these shorter sequences or just extra noise in the results? By “noise” I mean non-specific hits … these will confusing the taxonomic assignment of the reads you actually care about.

You could even filter for sequences 8-20 bases specifically and only to see what you get, then compare to 8-25, and maybe 8-30. This will find the minimum meaningful length for specific hits in your own data.

Alignment parameters: You are using all defaults. Try exploring the flip and other options. What these do is described in the tool Help, and the Mothur wiki. Determining the best parameters for your data is a very common scientific exercise.

Hope this helps and gives some ideas to explore! :slight_smile: