Bcftools convert to vcf skipping all rows

zhenderson1 · February 21, 2026, 1:23am

I’m trying to convert an example 23andMe dataset into a vcf file using bcftools, but it skips all four rows every time, and consulting the manual wasn’t much help. Any idea what I’m doing wrong?

Here’s the dataset:

and here are my settings:

jennaj · February 21, 2026, 2:36am

Hi @zhenderson1

Thanks for sharing the screenshot details! Very helpful.

The issue has to do with a mismatch between the chromosome identifiers (names). Your file uses an identifier with the format 20 while the UCSC reference genome indexed on the server uses the format chr20.

Description

This is a common hurdle for people working with bioinformatics data, in particular when moving data between platforms. I’ll post some details about the difference if you are curious about the “why”.

What to do

The last time I checked closely, 23andMe was using the GATK version of the hg19 reference genome.

We have this indexed in Galaxy! However, it is not available for the bcftools tools. And, you probably would not want to use this version of the genome labeling since this will somewhat restrict what you can do with the results later!

To be clear: the bases of the GRCh37/b37 genome assemblies all use the same coordinate system as the other “hg19” genomes. The difference is the chromosome labeling (identifier names) and possibly which chromosomes are included (other versions may contain more e.g. more haplotype and alt versions, usually not included for genotyping studies). All will include chromosome 20 aka chr20.

Options

Convert to using the UCSC version of the identifier labels.

Since you only have four lines of data, modifying the file directly in a text editor would be straightforward if you are very careful to not change the whitepace (tabs, spaces) in your file. Excel would NOT be recommended (ask me why!). Or, you can use a tool and even more tools!
Great: TextEdit on a MAC or the equivalent on a PC. Add in the chr to the current 20 identifiers to create chr20 for each line.
Better: Replace Text in a specific column in Galaxy. Since all are the same, your search/find will be somewhat simple. This is what I would do with this file.
For data with many more and different rows: Replace column by values which are defined in a convert file can be used. Common convert files are available! See the bottom of the tool form for the public repositories scientist often use, or create your own.

Continue to use the hg_g1k_v37 GATK reference genome version.

A version of the fasta of this genome can be found here.
http://datacache.galaxyproject.org/indexes/hg_g1k_v37/seq/hg_g1k_v37.fa
Copy and paste the link into the Upload tool. Use all default settings when loading the data. Allow the file to completely load (this may take a while!).
Use bfctools tool form’s option to use a Genome from the History option instead of a server index. Other tools may also require this choice. You may also need to assign the database key.

You have some choices! Please give these a try and let us know what worked for you!

zhenderson1 · February 27, 2026, 5:51pm

Thanks for the help! I decided to convert to UCSC using Replace column, and it worked like a charm!

jennaj · February 27, 2026, 6:16pm

Great! glad that worked and thanks for letting us know!