Using DANTE and Protein Domains Filter I noticed that sometimes there is no exact correspondence between the coordinates of the same domain between the gff and the fasta files. What is the reason for this?
Correct, these are not always expected to be an exact match. For this specific example, my guess is that the stop codon is being omitted from the fasta, but you could investigate that by examining the region in a genomic browser.
Quote from the
Protein Domains Filter tool form:
- Filtered GFF3 file
- Translated protein sequences of the filtered domains regions of original DNA sequence in fasta format
Translated sequences are taken from the best alignment (Best_Hit attribute) within a domain region, however this alignment does not necessarily have to cover the whole region reported as a domain in gff file
To learn more about how this tool suite tool functions, including Galaxy usage, please see:
Repeat Explorerdocumentation: http://repeatexplorer.org/
- Galaxy Repository: https://toolshed.g2.bx.psu.edu/view/petr-novak/dante/65a6fb89495d (includes links to the development and associated end-user resources)