Extract rDNA reads from sequencing reads

Isa · March 29, 2024, 2:21pm

Hello,

I’m working on an internship project involving galaxy, where I need to focus on a specific chromosome sequence. The data I have consists of whole genome NGS reads. Initially, I attempted to map these reads against the reference genome, but converting this file into a FASTA format results in a file larger than 200GB, which is unmanageable. Is there a method to extract this particular region, specifically the rDNA, from the whole genome dataset without encountering this issue?
Or is there another way to extract the rDNA from the sequencing reads?
Thanks in advance!

jennaj · March 29, 2024, 9:57pm

Hi @Isa

What reference genome is the chromosome from? Please share the link to the source you plan on using. This will add context so we can help more.

Some suggestions:

You could consider mapping against the full genome, then filtering the result for the chromosome of interest.
You could also create a custom genome from a single chromosome. Is this what you are having trouble with right now? Do you want to share the history that contains that reference genome and your attempts so far?

Let’s start there, thanks!

Isa · April 2, 2024, 12:12pm

Hi,

Again, thanks for your quick response!
The RefSeq I’m planning on using is chrXII from S. Cerevisiae, because that is where the rDNA is located in S. cerevisiae: Saccharomyces cerevisiae S288C - NCBI - NLM (nih.gov)

This I already tried, i have mapped it against the full genome, but I don’t understand how i can then filter the results for a specific genome. In this i also doubt whether I used the correct tool, right now I used the BWA-MEM2, but is there maybe another tool moor suitable if i want to extract the chromosome of interest?
I think I have this, from the literature I have a RefSeq of chromosome XII from S. cerevisiae, but I don’t really know what I could do with this. Below I shared the history with the reference genome, however due to confidentially I cannot share the previous attempts because this is data is confidential. I hope you can also help me with the data i provided.
Galaxy

If you have any more questions or need more information so that you can give better advice, let me know! Thanks in advance

jennaj · April 9, 2024, 7:28pm

Hi @Isa

To filter a BAM after mapping, type this into the tool panel to find choices: filter bam. Which mapping tool you use will not matter, as long as the output is a BAM file. You will be filtering on the name of the reference sequence the reads map to, so be sure to use the exact name of that target chromosome (find how that was formatted in the BAM header).

To create a custom reference genome, upload a fasta file of the entire genome, or just the chromosome of interest, into a history, and use it as the target with a mapping tool. Galaxy FAQs for Custom Reference Genome

You can get individual chromosome fasta files for Yeast from UCSC here → UCSC Genome Browser Downloads

Or, you can filter the fasta you have now with a tool like Filter FASTA.

Hope this helps. Everything you are doing is pretty standard, so there are a few ways to do it with different tools and methods, but it is definitely possible and the result will be about the same no matter which you choose.

Topic		Replies	Views
Extracting sequences from bed file using tools extract Genomic DNA tool and bed to Fasta tool usegalaxy.org support metadata , custom-genome , bedtools , custom-build	3	2140	June 30, 2020
Create reference genome from my WGS data (custom reference) usegalaxy.org support workflow , tool-dev	0	404	April 10, 2019
Issue with Extract Genomic DNA usegalaxy.eu support macs2	7	1409	May 18, 2019
Extract Genomic DNA issue with no reference genome available usegalaxy.org support galaxy-local	2	554	November 11, 2020
Extract Genomic DNA: index not found for hg19 usegalaxy.eu support bed , reference-index , chip-seq , server-side-error , epigenetics	4	1002	December 2, 2019

Extract rDNA reads from sequencing reads

Related topics