To analyze plant RNA seq data which database should be used to download gtf and fasta files? ensembl or NCBI . I keep having errors or empty gene tables . I am working on glycine max (soybean) . is there anyone help me with that ?
You would want to have database containing the most complete genome. We are not domain experts on plan genomics. However, if you would provide the information on which genome you would like to use, we can add it to Galaxy.
I really thank you for your help Anton. The same day I solved my problem after reading some articles under the learning hub and support hub in Galaxy. It seems that the gene IDs of SRA experiment that I use in my research are not those of Ensemble plants but of NCBI. I specifically (currently) work on soybean genomics. However, wheat, soybean, barley, tomato, cotton, rice, maize (corn), and sugar beet are the most produced crops around the world. Therefore, these crops are the target crops wanted to be genetically improved under breeding programs. If your offer is always open, I can suggest you more crops to add to Galaxy in the future. However, as I said before, these are the crops widely cultivated and researches were conducted on.
Hi again. As I said I have completed my analysis but there is a problem I cannot sort it out. The gene ids in my result file is refseq IDs however, I need phytozome ids. (I used NCBI files)
The used files from NCBI as follows :
However, when I conduct the same analysis with ensemble plants, whose files contain Phytozome IDs, I got empty result file. Please check the attached files. Can you tell me where do I make mistake ? Why do I get empty feature counts result?
Note: Please mind that I conducted analysis using each GTF file separately with featurecounts and I got nothing in each attempt.
Any help ?
I solved the problem by using NCBI assembly files (fasta and gtf files) in the link
I still do not have an idea why ensemble files do not work.