Thank you for the help. I also had a separate question. I am trying to process a big study with around 100 or so WGS samples. My end goal is to incorporate them into AADR which is a compendium of ancient and modern human DNA samples. The obvious thing to do would be to somehow extract the base positions of the BAM files that AADR overlaps with, but I’m not aware of any tool on galaxy that does so.
Hi @M.r.t
Great, glad you were able to resolve the prior question!
For your larger project, the one you referenced appears to be curated/trimmed reads mapped to hg19, then the variant calls underwent a genotyping process. Do you have access to the workflow methods? Do the author’s expect a certain submission format?
There are standardized methods, file formats, and tools for this type of analysis. Those are likely in Galaxy or analogous tools are here. You could maybe work with them to convert a workflow to a “Galaxy” version if that is the goal?
For your question
You are describing the calling of variants. The file format used to store this type of information is a VCF file. Galaxy has over a hundred tools to manipulate this type of data! If you could locate the expected workflow (methods in a publication?), we can guide you a bit better to specific tool choices.
But if you just want to get an overview, our tutorials here do this and can help to get you oriented.
- Variant Analysis / Tutorial List
- Then the first tutorial here is a simple overview: Hands-on: Calling variants in diploid systems / Calling variants in diploid systems / Variant Analysis
We are hosting a training event all week, and you can continue to work through the exercises indefinitely. It isn’t too late to join! The tutorial authors will be available in a smaller Slack group in all time zones during this core week. And questions here are great too, anytime!
- Galaxy Training Academy 2025
- One of the tracks is on this topic, and the recommended tutorials start with the one I highlighted above (after doing some of the “intro” tutorials, which will help to learn a bit about how to use collections and design a workflow – very powerful, especially with so many samples!). → Variant Analysis
- Even if this is not exactly what you want to do, or the scale in the tutorials are with smaller genomes, the tools will be about the same even for human, and the workflow templates will likely be useful. Others process human samples everyday on the public servers.