Hi, I used your Timmomatic tool to trim my reads and obtained 4 files: fastq R1 paired, fastq R2 paired, fastq R1 unpaired and fastq R2 unpaired. Can I use Collapse Collection tool to get a single dataset, then use Shovill Faster Spades Assembly tool to analyze it? Thanks.
Hi @huiping
Just use the two files that are still paired. Most assembly tools, including this one, will complain if there are orphan unpaired reads.
One more tip: should your assembly fail, Shovill happens to have really great logs (in my opinion)! You will get these from both successful and (most) failed runs. It will include genome size estimates and other scientific statistics that it detected in your data. You can then start to customize your values on the form, and try more rounds of assembly to refine the quality of that assembly (or, just to get a successful run without running into memory problems if that comes up). See the tutorials that include SPADes for more.
If you need to reduce the data even more still after tuning parameters, search the tool panel to find this tool → Sub-sample sequences files e.g. to reduce coverage.
Using just intact pairs, and filtering down the file sizes, is very common with assembly. The “extra” data is not necessarily helping to get a better scientific result, and can even be detrimental in some cases. But you can play around with parameters, compare assembly outcomes, and come to your own conclusions. I’m sure others feel differently about this given discussion at some scientific forums, but that’s ok, everyone has good reasons, and you can have yours now, too!
Hope this works out!
Thank you, Jennifer, for your comments.