RNA-STAR and hg38 GTF reference annotation

Hello,

You need to supply a reference annotation GFT dataset from the history at runtime.

The GTF should be based on the UCSC “hg38” genome build. Some choices:

  • For Gencode, copy the link to the GTF and paste it into the Upload tool. Hg38 data is here https://www.gencodegenes.org/. After it is loaded, remove the headers (lines that start with a “#”) with the Select tool using the options “NOT Matching” with the regular expression ^# . Once the formatting is fixed, change the datatype to be gft under Edit Attributes (pencil icon). The data will be given the datatype gff by default, which works fine with some tools and but not with others. Avoid the gff3 version of this particular data (contains duplicated IDs and several RNA-seq tools do not work with annotation in that format anyway).
  • For iGenomes, the archive corresponding to the target genome/build needs to be locally downloaded, the tar archive unpacked, and then just the genes.gtf data uploaded to Galaxy (browse the local file, or use FTP). Find all available genome/builds here: https://support.illumina.com/sequencing/sequencing_software/igenome.html
1 Like