I’m working on an internship project involving galaxy, where I need to focus on a specific chromosome sequence. The data I have consists of whole genome NGS reads. Initially, I attempted to map these reads against the reference genome, but converting this file into a FASTA format results in a file larger than 200GB, which is unmanageable. Is there a method to extract this particular region, specifically the rDNA, from the whole genome dataset without encountering this issue?
Or is there another way to extract the rDNA from the sequencing reads?
Thanks in advance!
What reference genome is the chromosome from? Please share the link to the source you plan on using. This will add context so we can help more.
Some suggestions:
You could consider mapping against the full genome, then filtering the result for the chromosome of interest.
You could also create a custom genome from a single chromosome. Is this what you are having trouble with right now? Do you want to share the history that contains that reference genome and your attempts so far?
Again, thanks for your quick response!
The RefSeq I’m planning on using is chrXII from S. Cerevisiae, because that is where the rDNA is located in S. cerevisiae: Saccharomyces cerevisiae S288C - NCBI - NLM (nih.gov)
This I already tried, i have mapped it against the full genome, but I don’t understand how i can then filter the results for a specific genome. In this i also doubt whether I used the correct tool, right now I used the BWA-MEM2, but is there maybe another tool moor suitable if i want to extract the chromosome of interest?
I think I have this, from the literature I have a RefSeq of chromosome XII from S. cerevisiae, but I don’t really know what I could do with this. Below I shared the history with the reference genome, however due to confidentially I cannot share the previous attempts because this is data is confidential. I hope you can also help me with the data i provided. Galaxy
If you have any more questions or need more information so that you can give better advice, let me know! Thanks in advance
To filter a BAM after mapping, type this into the tool panel to find choices: filter bam. Which mapping tool you use will not matter, as long as the output is a BAM file. You will be filtering on the name of the reference sequence the reads map to, so be sure to use the exact name of that target chromosome (find how that was formatted in the BAM header).
To create a custom reference genome, upload a fasta file of the entire genome, or just the chromosome of interest, into a history, and use it as the target with a mapping tool. Galaxy FAQs for Custom Reference Genome
Or, you can filter the fasta you have now with a tool like Filter FASTA.
Hope this helps. Everything you are doing is pretty standard, so there are a few ways to do it with different tools and methods, but it is definitely possible and the result will be about the same no matter which you choose.