Hi @Wenzhe_Yin and @Akos
Genomes that are hosted at UseGalaxy.org are available at this resource: http://datacache.galaxyproject.org/
The indexes that were created manually often have README files. The others have other types of encoding, since usually the database/dbkey is enough to learn the source and genome assembly/build.
For data that has the Arabidopsis_thaliana_TAIR10 database/dbkey, these are the records:
Index of /indexes/ → Index of /indexes/Arabidopsis_thaliana_TAIR10/ → Index of /indexes/Arabidopsis_thaliana_TAIR10/download/ →
- 20110613_README_Arabidopsis_thaliana_TAIR10 13-Jun-2011 19:28 811
The following files contain the fasta-formatted complete sequences of the 5 Arabidopsis chromosomes: TAIR10_chr1.fas TAIR10_chr2.fas TAIR10_chr3.fas TAIR10_chr4.fas TAIR10_chr5.fas Chloroplast chromosome: TAIR10_ChrC.fas Mitochondria chromosome: TAIR10_ChrM.fas These files provide details of the genome assembly updates: TAIR8_Assembly_updates.xls TAIR9_Assembly_updates.xls Please note that assembly changes in TAIR8 only consisted of substitutions while TAIR9 assembly changes also included insertions and deletions. Therefore, coordinates of most genes changed from TAIR8 to TAIR9. In TAIR10, no assembly updates were made.
- 20110614_README_Arabidopsis_thaliana_TAIR10 14-Jun-2011 16:23 197
ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/TAIR10_chr1.fas through TAIR10_chr5.fas, TAIR10_chrC.fas, and TARI10_chrM.fas
Change chromosome headers to be like chr1, chrC, etc.
Hope that helps. And, if you want to use a different or more current assembly than what is natively indexed, a custom genome → build can be used. Give the custom data a distinct database/dbkey or expect problems with tools due to conflicts with the built-in indexes.
FAQs