Please see the choices in the BFCtools
tool suite.
These tools will call variants (pileup or VCF), fill in reference bases where they are not represented in your data (a few different ways), and generate new consensus sequences given the 1) original reference sequence the variants were called against and the 2) variation output VCF. Plus many other manipulations. Please give these a try and see if it produces the output you each want – these are flexible tools with many options.
You will probably need to make use of a custom reference genome/transcriptome/exome fasta dataset. These tools do not have built-in indexes like mapping tools. If you are not sure where to find the fasta version of a pre-indexed reference genome you mapped against, please write back and we can help. No matter where you get it, it must be an exact match (genome build/source/version) for what you originally mapped against – plus the fasta should be in a very simple format – meaning, no “>” identifier line description content. The tool NormalizeFasta
can be used in most cases to standardize the format of fasta datasets.
The “consensus sequence” that used to be generated by older versions of Mpileup were encoded and probably not what you are both wanting as a final result (is NOT a fasta “consensus sequence” result based on the variation in your data – what you might think of as a type of “assembly” result).
Also, using coordinates of regions in a pileup result (or VCF result, or gtf/bed/interval result) to Extract
sequences from the genomic sequence will only result in fasta sequence based on that original reference genomic sequence again. It won’t including any base-level variation your read data may have had.
There isn’t a Galaxy Training Network tutorial that covers using these tools in detail, but looking at other workflows variant calling tutorials would probably help. Plus, you might want to compare tools/methods and compare. If interested, please see:
Thanks!