Whole genome comparison

Isa · March 15, 2024, 2:15pm

Hello,

For an internship project I need to compare 8 yeasts, where I need to find variable regions. All I received to do this is the NGS-data. I am fairly new to galaxy and I tried a lot already to achieve this but I am struggling to find the right tools to do so. Thus, I wonder if anyone has suggestions on how to achieve this whole genome comparison and how to find large variable regions.

jennaj · March 15, 2024, 9:45pm

Welcome, @Isa

The Galaxy Training tutorials here focus on a different species, yet the methods likely apply for what you are doing as well (Yeast researchers can correct me!). That means you can explore and adopt the workflows included.

GTN: Simple search terms are best. Try parts of the analysis description, or review categories directly. This query used “whole” and seemed to work great!

Isa · March 21, 2024, 8:37am

Hi,

Thank you for your reply. I attempted the tutorial, but it heavily emphasizes mRNA and pinpointing particular genes. My work involves DNA sequences, not mRNA, and I’m not targeting specific genes. Instead, my aim is to identify significant variable regions across eight different genomes for primer design to design specific primers for each genome to differentiate them from each other. Could you provide any guidance on how I might accomplish this?

jennaj · March 21, 2024, 7:19pm

Hi @Isa

I think there is a tutorial about using SNPs with just the assembled contigs. Did you you check all of the tutorial variations available to see if any are a match for what you want to do and your available data types?

If we do not actually have a tutorial, maybe you can find a publication that does what you want to do, and can use the tutorials as a sort of guide about how to translate methods out in the wild into a Galaxy protocol?

Many tutorials are based on a regular publication… Then community members did the method translation to Galaxy tools and functions, in the end creating a workflow and tutorial using a representative subset of the full sized dataset. Or, for assembly, chose to use a smaller genome, or single chromosome, etc.