I’m trying to convert an example 23andMe dataset into a vcf file using bcftools, but it skips all four rows every time, and consulting the manual wasn’t much help. Any idea what I’m doing wrong?
Here’s the dataset:
and here are my settings:
I’m trying to convert an example 23andMe dataset into a vcf file using bcftools, but it skips all four rows every time, and consulting the manual wasn’t much help. Any idea what I’m doing wrong?
Here’s the dataset:
and here are my settings:
Hi @zhenderson1
Thanks for sharing the screenshot details! Very helpful.
The issue has to do with a mismatch between the chromosome identifiers (names). Your file uses an identifier with the format 20 while the UCSC reference genome indexed on the server uses the format chr20.
This is a common hurdle for people working with bioinformatics data, in particular when moving data between platforms. I’ll post some details about the difference if you are curious about the “why”.
The last time I checked closely, 23andMe was using the GATK version of the hg19 reference genome.
We have this indexed in Galaxy! However, it is not available for the bcftools tools. And, you probably would not want to use this version of the genome labeling since this will somewhat restrict what you can do with the results later!
To be clear: the bases of the GRCh37/b37 genome assemblies all use the same coordinate system as the other “hg19” genomes. The difference is the chromosome labeling (identifier names) and possibly which chromosomes are included (other versions may contain more e.g. more haplotype and alt versions, usually not included for genotyping studies). All will include chromosome 20 aka chr20.
Options
chr to the current 20 identifiers to create chr20 for each line.You have some choices! Please give these a try and let us know what worked for you! ![]()
Thanks for the help! I decided to convert to UCSC using Replace column, and it worked like a charm!
Great! glad that worked and thanks for letting us know! ![]()