Digestion of Restrition sites?

Joelle99 · March 29, 2021, 6:42pm

Hi everyone!

This may seem as a odd question, but we realized that our Rad-Seq sequencing was affected by the fact that the library preparation had poor results : they used 2 enzymes (pstI and mspI), but for any reason a lot of them ended sticking to each other (eg we had sequences looking like this: pst1…mst1-mst1…mst1), creating chimeras. Is there a tool somewhere on Galaxy that could allow us to separate those sequences, or ultimately that could reject a sequence based on the presence of a restriction site in it?

Thank you!

Joelle99 · April 1, 2021, 3:04pm

Hi everyone

I think I cracked it, so I’ll put my solution here in case anyone have a similar issue. So in the end I chose to reject my chimeric sequences altogether, as I saw no way to “digest” them with any tools. So, in case you want to get rid of them to purify your data, here’s how I worked :

I converted my data from fastq to fasta via the Fastq.info tool. There are others available but this one allows you to produce a quality file as well as the Fasta, so if you want to keep all nucleotidic quality you may want to use this tool or an equivalent.
I then used the tool Filter FASTA (on the headers and/or the sequences) twice, entering the sequence of my first then second digestion enzyme (CTGCAG for pstl and CCGG for mspI). Make sure to check the Output discarded FASTA entries as the discarded entries are the ones without any traces of the restriction site (that may results in chimeras), and thus the ones we want to work with. So, you want to perform the second filtering on the data discarded from the first filtering to end with data exempt of any of these restriction sites.
I finally converted my data back to Fastq with the tool Make.fastq. You can either use real a quality file or just let the tool add a 100% quality indication to each nucleotide, depending on if you might need it later or not.

You should then be able to proceed with your analysis as normal ! Be aware though that this method leaves you with uncompressed fastq files, which has not been a problem for me but may be worth noting.

Hope this helps someone one day!

Topic		Replies	Views
RAD-seq remove samples from raw data usegalaxy.eu support workflow , fastqsanger	3	277	July 12, 2023
Filtering reads based on a sequence usegalaxy.org support filter , quality-control	2	657	April 7, 2021
Tool request: Get data from Genbank/RefSeq by accession usegalaxy.org support tool-dev	1	546	August 6, 2019
Need help for find SNP from the RAD-seq data of a F2 population variant-analysis	0	418	December 17, 2019
Comparison of RNA-seq data with a published paper. transcriptomics , rna_star	1	639	September 30, 2022

Digestion of Restrition sites?

Related topics