How to turn .bam into VCF

rachaeldavid1 · January 14, 2025, 6:31pm

I’m not in this industry at all and can’t seem to have AI help me here. I’m trying to turn a .bam into VCF with the FreeBayes tool but I keep running into an error (it says unknown error) when I run the tool. The genome build is 19/GRCh37, but there are a ton of options for this. i’ve tried quite a few and none have worked. Would really appreciate any help here for a total novice!

jennaj · January 14, 2025, 6:35pm

Welcome, @rachaeldavid1

Glad you found your way over here!

There is a potential problem at UseGalaxy.org right now. I’m talking with our administrative team and can get clarity or a correction.

This is how the error is presenting. Is this also what you are getting?

History panel screenshot for a dataset

More soon, and you can confirm if this is also your error. Thanks!

rachaeldavid1 · January 14, 2025, 6:39pm

Ah yes this is my exact error! Thanks for this clarification. Again I don’t work in this field at all and really have no idea what I’m doing, I’m just a complex chronic illness patient who’s geneticist is too busy so I’m trying to do the work myself.

Idk if this context is helpful, but what I’m trying to do is upload my genetic I received from Invitae testing into Promethease. I only have .bam and .bam.bai files and Promethease only takes VCF, so i was trying to use this tool to convert. If there’s any easier ways to go about this I’ll do whatever!

jennaj · January 14, 2025, 6:48pm

Hi @rachaeldavid1

I’m messaging your direct, so let’s continue in there.

The “conversion” is more like generating a meaningful summary than a direct translation between file formats.

Your medical data might not be a good fit for the public servers if you are concerned about privacy. But we can discuss.

rachaeldavid1 · January 14, 2025, 6:52pm

Gotcha-- waiting for that DM. I can’t figure out how to send you one but waiting for yours!

jennaj · January 14, 2025, 10:53pm

Hi @rachaeldavid1

Your BAM file appears to be based on the human genome, but it is not the version hosted as hg19 at the public Galaxy servers (how the data is currently labeled, and the genome selected with Freebayes originally).

This is what the error is reporting: mismatched chromosome identifiers

More about the different human genome assemblies is here.

Reference genomes at public Galaxy servers: GRCh38/hg38 example

This guide has more details about how those kinds of check are done at a detailed level.

FAQ: Mismatched Chromosome identifiers and how to avoid them

Your data might be based on hg_g1k_v37. If yes, you can use Freebayes against that human genome reference. I would try this first.

Update: this will work if you add in default Read groups to the BAM file with the tool AddOrReplaceReadGroups. This results in a VCF from Freebayes but it is not annotated. If you want to create an annotated VCF, then you’ll need to try the next suggestion below instead, starting from the fastq reads.

The other option is to extract the fastq reads out of the BAM you have, then to map against a version of the human genome we host and proceed to downstream steps that way. This is probably the cleanest way to create the file you want (a VCF file) or to obtain rs identifiers, but this might mean that the data can’t be used in other external applications (because they are expecting data based on a different human genome assembly!).

bedtools Convert from BAM to FastQ

If the goal is to just learn if the data includes any known SNP rs identifiers, you can do that, and following this protocol with a single sample, from the starting reads, is what to try.

Hands-on: Exome sequencing data analysis for diagnosing a genetic disease / Exome sequencing data analysis for diagnosing a genetic disease / Variant Analysis

Hope that gives you some options!

rachaeldavid1 · January 15, 2025, 2:10am

So I tried changing to hg_g1k_v37. I keep asking Invitae, but they just keep repeating 19/GRCh37 is used. I did AddOrReplaceReadGroups successfully based on hg_g1k_v37, but I’m still getting the same error code as I got originally in the screenshot above. Does that mean I should move onto the other options? Confirming it does not have to be annotated.

jennaj · January 15, 2025, 7:25pm

Hi @rachaeldavid1

I’ve created an example here.

https://usegalaxy.org/u/jen-galaxyproject/h/example-for-httpshelpgalaxyprojectorgthow-to-turn-bam-into-vcf14442

You can examine the data from here, run these tools again with different parameters, and maybe follow the tutorials to see how to refine the calls.

Hope this helps!

Topic		Replies	Views
Variant calling from VCF files chrominfo , vcf	3	769	October 16, 2023
Output Freebayes results as gVCF usegalaxy.org support freebayes	2	19	August 1, 2025
convert 23and me to vcf bcftools	3	2230	January 10, 2019
Ways to make a variant calling for RNA Seq (paired-end) usegalaxy.org support freebayes , transcriptomics , variant-analysis , vcf , rna_star	5	816	April 12, 2023
FreeBayes Software Error Report usegalaxy.org support workflow , troubleshooting , variant-analysis	2	18	August 26, 2024

How to turn .bam into VCF

Related topics