Variant calling from VCF files

Adrian · October 11, 2023, 10:45am

Hello,

I have the 3 gzipped VCF files (.vcf.gz, proband, mother and father) obtained from a sequencing lab, which also provided me the .vcf.gz.tbi files.

Following the variant analysis tutorial of Galaxy, from the step of calling FreeBayes, an option of merging the 3 .bam files, provides a single multisample .vcf file in which the annotation is made. I am trying to merge these 3 files through bcftools merge but when I call bcftools norm with the same parameters as in the tutorial, an error is shown:

[E::faidx_adjust_position] The sequence “1” was not found
faidx_fetch_seq failed at 1:69270

Do you know how can I solve this?

jennaj · October 11, 2023, 9:02pm

Hi @Adrian

There is probably a chromosome mismatch problem. Meaning, the genome assigned as the database is not an exact match for the reference genome that was used for the variant calling.

This FAQ explains how to confirm: Mismatched Chromosome identifiers and how to avoid them

Adrian · October 16, 2023, 11:07am

Thank you! Finally I confirmed the error was on the reference genome. Apparently the reference genome employed was b37 and I would like to make the sequence annotation on hg19. Do you know if there is any way of converting the coordinates?

jennaj · October 16, 2023, 7:38pm

Hello @Adrian

Freebayes might already contain your genome.

I ran a quick alignment to generate some headers against that genome index. Those specify the chromosome identifiers, and can be compared to your VCF headers. History share link Galaxy

I think that will work for you given what you have shared already.

For others reading, if your original genome is not indexed at some server, this is what to try:

Mapping data between genomes is probably not a good idea for variant calling protocols. For this reason, remapping the data against a supported genome would be recommended.

If you just need a database key, you can load up the reference genome fasta that matches your data and use it like a custom genome. Promoting to a custom build will create a database metadata key that you can assign to datasets, and that will avoid conflicts. Keep in mind that any other data you incorporate needs to also be based on that exact reference genome (annotation, etc).

FAQs

Custom Reference Genomes – first, get the fasta format cleaned up as needed
Custom Build (database) – create and assign your custom database to datasets

Topic		Replies	Views
Error while Intersecting VCF files usegalaxy.org support variant-analysis , vcf	3	450	February 4, 2023
variant calling samtools vcf	5	953	December 13, 2021
Project help...(variant analysis) usegalaxy.org support tool-deprecated , picard_markduplicates	1	2512	August 3, 2020
Ways to make a variant calling for RNA Seq (paired-end) usegalaxy.org support freebayes , transcriptomics , variant-analysis , vcf , rna_star	5	811	April 12, 2023
Question about the Galaxy Tutorial ''Exome sequencing data analysis'', the dbSNP vcf file gtn-tutorial , tutorial-feedback , variant-analysis	3	1343	July 2, 2019

Variant calling from VCF files

Related topics