I wonder where I can find the info of the version, the latest update time and source info of the built-in genomes. for instance, the Tair10 genome, is it the same with Ensemble Tair10 (05-Mar-2021) or the older version of Tair10 (2019-07-11) deposited on Tair FTP? many thanks.
Did you get any response?
I am also interested in this question.
I spent some time looking for these info but did not find anything…
Hi @Wenzhe_Yin and @Akos
Genomes that are hosted at UseGalaxy.org are available at this resource: http://datacache.galaxyproject.org/
The indexes that were created manually often have README files. The others have other types of encoding, since usually the database/dbkey is enough to learn the source and genome assembly/build.
For data that has the Arabidopsis_thaliana_TAIR10 database/dbkey, these are the records:
Index of /indexes/ → Index of /indexes/Arabidopsis_thaliana_TAIR10/ → Index of /indexes/Arabidopsis_thaliana_TAIR10/download/ →
The following files contain the fasta-formatted complete sequences of the 5 Arabidopsis chromosomes:
These files provide details of the genome assembly updates:
Please note that assembly changes in TAIR8 only consisted of substitutions while TAIR9 assembly changes also included insertions and deletions. Therefore, coordinates of most genes changed from TAIR8 to TAIR9.
In TAIR10, no assembly updates were made.
ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/TAIR10_chr1.fas through TAIR10_chr5.fas, TAIR10_chrC.fas, and TARI10_chrM.fas
Change chromosome headers to be like chr1, chrC, etc.
Hope that helps. And, if you want to use a different or more current assembly than what is natively indexed, a custom genome → build can be used. Give the custom data a distinct database/dbkey or expect problems with tools due to conflicts with the built-in indexes.