Does using the Lotus2 tool directly offer more options for doing this differently? That might be true but I’ll need you to point me to exact detail if you want to make a request or to clarify the usage options more. I can’t currently find where the pair merging would only happen before, or only happen after, the OTU clustering with Lotus2 but I could be missing something!
As you explained, LotuS2 merges paired reads after clustering by default. But I have found that many OTUs (including dominant OTUs) were not merged probably due to their short read lengths after quality filtering in the case of my data.
At present I am working with LotuS2 after installing it on our server and have solved the above problem by adjusting an option (TruncateSequenceLength -1) in the sdm file (LotuS2 — Less OTU scripts for a simpler 16S pipeline).
I have also found that the command “-mergePreClusterReads” can be performed as an option following the basic lotuS2 command but probably not provided on the Galaxy now.
Looking at the -mergePreClusterReads flag closer, the description implies that it makes clustering a bit worse. Mostly because the other per-OTU clustered reads help to capture the stray mates better via the rough scaffolding/extensions. The default in Galaxy would be 0 right now. I don’t think this is what you are looking for but please correct me! We could make a request to add it into the wrapper.
(0) no merging or reads pre OTU/ASV/zOTU seq clustering, BUT read merging after seq clustering (to get better representative sequence). (1) Merge reads prior to seq clustering. WARNING!! This will considerably reduce the number of valid read pairs, as additional quality filters will be applied, algorithm is still in development !! (Default: 0)
Then, for -TruncateSequenceLength, that is included in the sdm file optionally provided by the user at runtime. If not provided, the defaults are applied. This would work in the current Galaxy wrapper (and when the tool is used directly of course).
I’m sure you know this! I’m including it for anyone else who runs across this topic later who may be new to the tool and interested in these or similar options. Having reads that need a bit of nudging to get a good clustering result is real world science!
Glad you have a current solution and let me know your thoughts on the additional parameter.