I am trying to do annotation of deseq2 from my deseq2 results. However, whenever I run it with a downloaded gtf file of hg38, I get columns 8:13 blank. I am wondering whether it is because I used the built-in genome for all my previous analysis? However, I do not see such an option for this annotation tool. Is there a way to fix this error? Is there a way to download the built in genome used previously (such as in featurecounts etc). Many thanks!
Hi @mfaleevs
I think I saw you ask this in another topic but I can’t find it again. That’s Ok.
How to solve most reference annotation problems:
- Make sure to use the same exact reference annotation for all steps in the same analysis.
- That could be a built-in annotation (only supported with a few tools, and for a few reference genomes) or an annotation supplied by you from the history.
- GTF formatted annotation tends to work best. UCSC has this for hg38. Be sure to get it from their Downloads area, and not from the Table Browser. I’ll add a tag that links to prior Q&A with the “why”, or you can just try this way instead
- See the last section in this FAQ for exactly where to source the data and how to get it into Galaxy → Working with GFF GFT GTF2 GFF3 reference annotation
- Once the file is in your history, toggle the tool from to use “annotation from the history” and select the dataset.
Hi @jennaj , many thanks for your response! I am now trying to follow the instructions on how to download the genome, but it tells me that “The uploaded file contains invalid HTML content”. This is the file that I am using (Index of /goldenPath/hg38/bigZips/genes). Any advice would be greatly appreciated!
Hi @mfaleevs
Inside that directory are several files – so you’ll need to copy the URL of one of the gtf.gz’s, paste that into the Upload tool. Leave all options at default and submit. The result will be in your history: an uncompressed reference annotation dataset – with the datatype gtf assigned – that is a “match” for the UCSC hg38 reference genome natively indexed at the UseGalaxy.* servers.
If you are not sure what each annotation represents – explore in the main Browser area at the UCSC site as each has a full description. ncbiRefSeq or ensGenes are common choices – search online with “ensembl versus refseq” for more opinions about the content/differences.
You could also just try using a couple of different choices available and see which produces the most useful annotation given your specific data and analysis goals (meaning, try each individually, not mixed together).
Hope that helps!