Trimming with different parameters/tools give vastly different end results.

Hi @rachel.lee

If I am understanding this correctly, it does seem that using the NEB adaptors with Cutadapt defaults are either removing the signal from your reads, or perhaps not doing enough to remove the adaptor with specificity and that is preventing accurate matches with the downstream steps.

Then, the default parameters with Trim Galore! and fastp are doing a reasonable job, or at least producing somewhat similar results. Whether that is the “correct result” or not is a bit of a scientific judgement call.

An example tutorial here is also using Trim Galore! (perhaps this is what you are already following?)

The part I would suggest paying attention to the most in that example is the use of advanced parameters to guide the clipping and filtering.

We have a public workflow here that you could adapt to explore different trimming parameter settings. You can replace the trimming tool and adjust MultiQC to accept any report type. Then run this once per parameter set. Or adjust anything you want in your copy since this could be expanded to include several trimming runs in the same MultiQC report (complex but certainly possible!).

Another idea is to review the data directly. If you have control read data with an expected result profile, how is that being reported? Or, flip that and instead start with a feature you expect to be included and assess how each BAM is supporting it. Drilling down into the alignments in a genome browser and comparing to multiple feature tracks can be super informative. I really like UCSC for this since so many feature tracks already there but IGV is a good alternative (load your own feature references).

  • Look at both positive and negative read support.
  • Perhaps include the default BAM and filtered BAM for each sample.
  • Positive coverage profiles (portions of reads that are aligning).
  • Negative coverage profiles (portions of reads that are not aligning eg the 5’/3’ “overhangs”).
  • The GTN has guides for sending data to a genome browser but please ask if you cannot find those or have questions.

Please let us know is this helps or not! :slight_smile:

1 Like