Annotation of Deseq2 results not giving full results

I am trying to do annotation of deseq2 from my deseq2 results. However, whenever I run it with a downloaded gtf file of hg38, I get columns 8:13 blank. I am wondering whether it is because I used the built-in genome for all my previous analysis? However, I do not see such an option for this annotation tool. Is there a way to fix this error? Is there a way to download the built in genome used previously (such as in featurecounts etc). Many thanks!

Hi @mfaleevs

I think I saw you ask this in another topic but I can’t find it again. That’s Ok.

How to solve most reference annotation problems:

  1. Make sure to use the same exact reference annotation for all steps in the same analysis.
  2. That could be a built-in annotation (only supported with a few tools, and for a few reference genomes) or an annotation supplied by you from the history.
  3. GTF formatted annotation tends to work best. UCSC has this for hg38. Be sure to get it from their Downloads area, and not from the Table Browser. I’ll add a tag that links to prior Q&A with the “why”, or you can just try this way instead :slight_smile:
  4. See the last section in this FAQ for exactly where to source the data and how to get it into Galaxy → Working with GFF GFT GTF2 GFF3 reference annotation
  5. Once the file is in your history, toggle the tool from to use “annotation from the history” and select the dataset.

Hi @jennaj , many thanks for your response! I am now trying to follow the instructions on how to download the genome, but it tells me that “The uploaded file contains invalid HTML content”. This is the file that I am using (Index of /goldenPath/hg38/bigZips/genes). Any advice would be greatly appreciated!

Hi @mfaleevs

Inside that directory are several files – so you’ll need to copy the URL of one of the gtf.gz’s, paste that into the Upload tool. Leave all options at default and submit. The result will be in your history: an uncompressed reference annotation dataset – with the datatype gtf assigned – that is a “match” for the UCSC hg38 reference genome natively indexed at the UseGalaxy.* servers.

If you are not sure what each annotation represents – explore in the main Browser area at the UCSC site as each has a full description. ncbiRefSeq or ensGenes are common choices – search online with “ensembl versus refseq” for more opinions about the content/differences.

You could also just try using a couple of different choices available and see which produces the most useful annotation given your specific data and analysis goals (meaning, try each individually, not mixed together). :slight_smile:

Hope that helps!