Extract fields from sample-specific/genotype columns of a VCF with SnpSift Extract Fields

Greetings,

We are trying to use SnpSift Extract Fields on a vcf file to convert it to a tabular file. However, after multiple tries either on this and other account, we never managed to extract the information on the last field of our file, named “I2G_129”, which contains genotype information that matches the previous column - “FORMAT” - and right now is always outputing the value 0 instead of the values of the vcf file.

We really don’t understand what’s causing the issue and whether it’s a problem on our side or caused by Galaxy, and would deeply appreciate your kind help on this matter.

Best regards,
Jorge

1 Like

Hi @Jorge

If you want to share a small example with 1) header 2) a few data lines and 3) query we might be able to help here to get that corrected, or to report a problem so we can fix it.

See the banner at this site for how to share your work for feedback, or review here directly:

The usage examples in the tool’s help section could be useful for you @Jorge. Since they are a bit complicated here’s some additional explanation:
Any columns following the FORMAT column are called “GENOTYPE” columns in VCF jargon. SnpSift lets you reference these columns by their index so, for example, GEN[0] refers to the first GENOTYPE column. Then to refer to certain keys listed in the FORMAT column (which serves as a legend for all GENOTYPE columns), you’d use . qualifier notation, for example, GEN[0].GT to refer to the genotype field of the first GENOTYPE column.
More syntax examples are provided in the second usage example of the tool help.

2 Likes

Thank you so much for your valuable comment. The usage of GEN[*] has worked perfectly for our intended use.
I might have overlooked the usage examples in the tool’s help section, but I tend to agree that they are a bit complicated and I only understood them after reading your comment.
Thanks again for your help!

2 Likes