Reference annotation for TAIR10

Dear Jen,
I read several old and new post about .gtf file not matching with.fa file for Arabidopsis thaliana (TAIR10). Some users found the solution some did not.
Can you please upload the correct matching files for TAIR10 to galaxy.eu, so we can all avoid the same problem or give us the link to download the correct matching files.
Many thanks,
Dan

Welcome, @Dan_Reed

The older post you started from reference a data source that is 10+ years old, and isn’t needed with the current differential expression tools. That means Ensembl is a good resource.

The current Ensembl version is hosted here → Arabidopsis_thaliana - Ensembl Genomes 60

You can load the fasta and GFF3 data into Galaxy by capturing the links and pasting those into the Upload tool. Use all default settings so that the data is uncompressed with the correct datatypes.

For use with certain tools, you may need to convert a GFF3 file format to a GTF file format. You can use the tool gffread to do that.

If you want to do those steps, and share back the history, I can help to confirm those are correct. We can leave that history shared here or I can grab a copy and share it from my account.

Reference annotation data can change over time, and that is one of the reasons we do not host it as part of a native index. Reference genomes are most stable (but how they are labeled can vary by data provider): once the baseline assembly is created, and the assembly has a version, that will not change. Annotation attached to that baseline assembly version will always work with the baseline assembly (if the labels are the same!). Now, there might be additions to the assembly itself, and newer versions of the annotation might capture those new assembly contigs, but that is more complicated usage.

If those new additions to the assembly are important to you: this is why I am suggesting to upload both the assembly and annotation for a particular release, and use the Custom Reference Genome function. More about the custom functions can be found under the tags custom-genome and custom-build. (sorting by “most recent topics” will find the most recent troubleshooting)

If you are working at a UseGalaxy server, you might be able to just upload the reference annotation. I would need to know more about where you are working to review what is indexed on the server for a built-in genome index and the exact annotation file you plan to use with it to comment about the compatibility, or what changes might be needed. The tools you plan to use would also be helpful – some do not work well with GFF3, or need special adjustments on the tool form. (that is just how the original author wrote the underlying tool – hosting it in Galaxy can’t change that part)

This guide has more technical details and can help whether you plan on performing DE analysis or something else. → FAQ: Extended Help for Differential Expression Analysis Tools

This is all complicated, I know. But hope I have helped and we can follow up more with any questions you may have! :slight_smile: