Non-available Reference genomes in Galaxy (for plants)

aaak · July 28, 2020, 9:23am

To analyze plant RNA seq data which database should be used to download gtf and fasta files? ensembl or NCBI . I keep having errors or empty gene tables . I am working on glycine max (soybean) . is there anyone help me with that ?

nekrut · July 28, 2020, 1:15pm

You would want to have database containing the most complete genome. We are not domain experts on plan genomics. However, if you would provide the information on which genome you would like to use, we can add it to Galaxy.

aaak · July 29, 2020, 11:58am

I really thank you for your help Anton. The same day I solved my problem after reading some articles under the learning hub and support hub in Galaxy. It seems that the gene IDs of SRA experiment that I use in my research are not those of Ensemble plants but of NCBI. I specifically (currently) work on soybean genomics. However, wheat, soybean, barley, tomato, cotton, rice, maize (corn), and sugar beet are the most produced crops around the world. Therefore, these crops are the target crops wanted to be genetically improved under breeding programs. If your offer is always open, I can suggest you more crops to add to Galaxy in the future. However, as I said before, these are the crops widely cultivated and researches were conducted on.

aaak · August 5, 2020, 1:57pm

Hi again. As I said I have completed my analysis but there is a problem I cannot sort it out. The gene ids in my result file is refseq IDs however, I need phytozome ids. (I used NCBI files)

The used files from NCBI as follows :

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/004/515/GCF_000004515.5_Glycine_max_v2.1/GCF_000004515.5_Glycine_max_v2.1_genomic.fna.gz [HISAT2 TOOL]

https://ftp.ncbi.nlm.nih.gov/genomes/refseq/plant/Glycine_max/latest_assembly_versions/GCF_000004515.5_Glycine_max_v2.1/GCF_000004515.5_Glycine_max_v2.1_genomic.gtf.gz [FeatureCounts tool]

However, when I conduct the same analysis with ensemble plants, whose files contain Phytozome IDs, I got empty result file. Please check the attached files. Can you tell me where do I make mistake ? Why do I get empty feature counts result?

Note: Please mind that I conducted analysis using each GTF file separately with featurecounts and I got nothing in each attempt.

Any help ?

aaak · August 6, 2020, 6:20am

I solved the problem by using NCBI assembly files (fasta and gtf files) in the link

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/004/515/GCA_000004515.4_Glycine_max_v2.1/

I still do not have an idea why ensemble files do not work.

Topic		Replies	Views
RNAstar Select Reference Genome - Species Not Available - contact the Galaxy team (--genomeDir) usegalaxy.eu support custom-genome , mapping , reference-annotation , reference-genome , custom-build , reference-transcriptome	1	18	December 4, 2024
Issue regarding Upoading of the Genome in fasta file from NCBI usegalaxy.org support gtn-tutorial , galaxy-local	2	512	March 9, 2021
Reference transcriptome reference-annotation	3	143	January 16, 2024
rice reference genome (FASTA) and annotation genome (GFT)	0	409	May 18, 2020
How to get the genome of interest listed in RNA STAR in Galaxy reference-index , custom-genome , transcriptomics , rna_star	4	400	July 19, 2023

Non-available Reference genomes in Galaxy (for plants)

Related topics