Add Sample to LoFreq VCF file

Hi, I’ve made a vcf file with the “Call variants with LoFreq” tool. Does Galaxy have a tool that can add the Format/Sample columns?

I don’t know of such a tool, but if you can explain your usecase, we might be able to suggest a workaround.

I wanted to run LoFreq on a trio of bam files and then merge them to find variants in one sample and not the others. That’s hard without the sample column.

I see. Things you could try:

  1. Use the VCF-VCFintersect tool to subtract from VCF 1 the variants found in VCF 2, then use the same tool again to subtract from the result the variants found in VCF 3.
  2. Use the tool vcfanno to annotate VCF 1 with INFO fields from VCF 2 and VCF 3. You can take any INFO element that lofreq has used for all variants and store them under new names in the annotated VCF 1 output. Then in the next step, use e.g. SnpSift Filter to remove those variants from the annotated VCF 1 that have the new INFO elements (i.e. were part of VCF 2 and/or VCF 3).

The second approach has the advantage that it scales better to a larger number fo files.

In general, however, note that if your question is something like: what variants in sample 1 are absent from sample 2 and/or 3 with an error probability of only so much, then lofreq is simply the wrong variant caller. Freebayes can do joint variant calling on a population, will output information about all samples in a single VCF, will be more sensitive in detecting shared variants and will emit per-sample genotype likelihoods as part of the sample columns.

Hi @numbergirl86

The samples included in VCF output is usually derived from the read group labels inside of the source input BAM data. You can add in read groups this way when mapping. You can also add/adjust read groups in BAMs after mapping but before calling variants.

Some tools to consider:

  • AddOrReplaceReadGroups
  • Or, see the read group section on your mapping tool’s form: BWA, BWA-mem, Bowtie2

GTN example of assigning reads group during mapping. This is the usual protocol since adding the information in at this steps can help to avoid sample mixups! The workflow has an example of this

Finally, the IWC hosts several variant workflows! Maybe one of these will work for you? Or, helps when developing your own pipeline?

Technically, you can also modify the VCF file directly (it is just a text file), however this can be tricky without having existing sample labels already, so I agree with @wm75 and wouldn’t recommended this method.

  • bcftools reheader – easier when changing labels, not adding
  • various text manipulation tools

Please let us know if this help or not and follow up questions are welcome! :slight_smile:

Thanks, I’ll try those suggestions.

1 Like