Comparing SNPs across samples and sample groups

I have the corresponding VCF outputs from running FreeBayes and SNPeff for individual mapped transcriptomes but I want to find common variants between samples (eg. individual 1 vs individual 2). Is there a tool that allows this and accounts for all the genes in the transcriptome? (note that I do not have specific genes in mind so I must go through the transcriptomes) Additionally what can I use to translate the relevant sequences to their amino acid code?

Hi @VI_Rodriguez
Can you provide more information on your pipeline? Have you mapped the reads to different transcriptomes or you mapped RNA-Seq data to a reference genome?

Different Galaxy servers have different sets of tools. Maybe search the tool panel (it has a search/filter box at the top) with “translate” or “orf” or “TransDecoder”.

Hope this helps.
Kind regards,
Igor

I have mapped reads sourced from the different samples to the mm10 reference genome using RNA STAR

Hi @VI_Rodriguez

Jumping in since is the weekend now in Australia (where @igor is) :slight_smile:

Thanks for posting the extra details!

If your samples are all mapped to the same reference, you can run Freebayes in batch mode. The merged VCF output will have columns for each input sample (BAM) for any features called in any sample.

Screenshot

For your questions about nucleotide translation to protein, there are several tool options. Search the tool panel with “tran” to find these.

But, if your overall goal is to learn how variants impact protein translation, a tool like SnpEFF would probably be interesting for you to explore.

Both Freebayes and SnpEFF have tutorials that show how this works, plus common intermediate steps/tools (and why those were used). Find these linked at the bottom of the tool forms, or you can review the Variants topic at the GTN directly.

Hope this helps and @igor can add more. :slight_smile: