Reference genome not loading in RNA Star

I’m attempting to run eCLIP RNA seq analysis on some data. Everything has been working successfully, and I’m following this tutorial, with the only difference being that theirs is paired-end and mine is single-end.

I believe all the file types in my history are correct (history shown here). But when I get to the RNA Star tool (step 4), the reference genome “Homo sapiens (hg38+GRCh38.87)” doesn’t load [see photo]. When I bypassed this step by uploading a custom genome (from UCSC), there were problems downstream in the workflow, so I would prefer to use a reference genome like the tutorial.

I’ve tried different versions of RNA Star, as well, with the same outcome. The reference genome loads when the drop down says “use without builtin gene model,” but disappears when I select “use with builtin gene model,” which is my preference, according to the tutorial.

Hi @Emlyn

I checked the ORG server. It provides RNA_STAR index for hg38 assembly, but without gene models. You have two options for the available hg38 index. You can run RNA_STAR with built-in hg38 index with or without gene annotation file.

When reads are mapped to a custom reference genome, the output files usually are not assigned to database (dbkey), and some tools require database (dbkey). If you click at any output from RNA_STAR job in your history, you’ll see: database ?. I suspect this is an issue.

Kind regards,
Igor

I agree with @igor – and from what I can see, none of the other UseGalaxy servers host the reference genome with the special tutorial reference annotation built in the same as the UseGalaxy.no server.

I can see that you got this working, but for anyone else running into this same issue:

The annotation changes more frequently that the assembly, so people like to supply their own. Any source of human reference annotation that is also based on the GRCh38/hg38 assembly will work, especially for a tutorial. Later on when working on real data, you can learn about the different sources, choose the best content for your project, and make minor changes to help with technical compatibility (format tuning, not content).

We have some guides

  1. FAQ: Extended Help for Differential Expression Analysis Tools

    Scroll down to the section about reference data.

  2. Reference genomes at public Galaxy servers: GRCh38/hg38 example

    When using the built-in native genome for hg38 at UseGalaxy.org, the NCBI RefSeq annotation hosted by UCSC would be a good choice. Simple format, curated content, based on the correct base assembly.

  • FAQ: Working with GFF GFT GTF2 GFF3 reference annotation (linked from the guide above)
    UCSC annotation
    • Find annotation under their Downloads area. The path will be similar to: https://hgdownload.soe.ucsc.edu/goldenPath/<database>/bigZips/genes/
    • Copy the URL from UCSC and paste it into the Upload tool, allowing Galaxy to detect the datatype.

Getting a custom genome and annotation paired up is also possible (and it looks like you were able to do this!), but I think just getting the annotation the first time is a bit easier. :slight_smile: