Digestion of Restrition sites?

Hi everyone!

This may seem as a odd question, but we realized that our Rad-Seq sequencing was affected by the fact that the library preparation had poor results : they used 2 enzymes (pstI and mspI), but for any reason a lot of them ended sticking to each other (eg we had sequences looking like this: pst1…mst1-mst1…mst1), creating chimeras. Is there a tool somewhere on Galaxy that could allow us to separate those sequences, or ultimately that could reject a sequence based on the presence of a restriction site in it?

Thank you! :slight_smile:

Hi everyone :slight_smile:

I think I cracked it, so I’ll put my solution here in case anyone have a similar issue. So in the end I chose to reject my chimeric sequences altogether, as I saw no way to “digest” them with any tools. So, in case you want to get rid of them to purify your data, here’s how I worked :

  1. I converted my data from fastq to fasta via the Fastq.info tool. There are others available but this one allows you to produce a quality file as well as the Fasta, so if you want to keep all nucleotidic quality you may want to use this tool or an equivalent.
  2. I then used the tool Filter FASTA (on the headers and/or the sequences) twice, entering the sequence of my first then second digestion enzyme (CTGCAG for pstl and CCGG for mspI). Make sure to check the Output discarded FASTA entries as the discarded entries are the ones without any traces of the restriction site (that may results in chimeras), and thus the ones we want to work with. So, you want to perform the second filtering on the data discarded from the first filtering to end with data exempt of any of these restriction sites.
  3. I finally converted my data back to Fastq with the tool Make.fastq. You can either use real a quality file or just let the tool add a 100% quality indication to each nucleotide, depending on if you might need it later or not.

You should then be able to proceed with your analysis as normal ! Be aware though that this method leaves you with uncompressed fastq files, which has not been a problem for me but may be worth noting.

Hope this helps someone one day!