Another STAR error -- Resolved

in processing some RAT RNA-seq data I recieved the following error:

slurmstepd: error: couldn’t chdir to `/srv/pulsar/main/venv/lib/python3.6/site-packages’: No such file or directory: going to /tmp instead

EXITING because of FATAL ERROR: could not open genome file /cvmfs/data.galaxyproject.org/managed/rnastar/2.7.4a/rn6/rn6/dataset_f3c6453e-d0d8-4981-acbc-d8ed812ad69e_files//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions

Sep 11 20:53:44 … FATAL ERROR, exiting
[E::hts_open_format] Failed to open file “/jetstream/scratch0/main/jobs/30550923/outputs/dataset_45038405.dat” : No such file or directory
[E::hts_open_format] Failed to open file “/jetstream/scratch0/main/jobs/30550923/outputs/dataset_45038405.dat” : No such file or directory
[E::hts_open_format] Failed to open file “/jetstream/scratch0/main/jobs/30550923/outputs/dataset_45038406.dat” : No such file or directory

Is this a problem with the GTF file and if so can someone point me towards the correctly formatted file type?

1 Like

^^ This part of the error can be ignored – it is just a log message produced by the remote cluster

^^ This part describes a tool/server/resource path problem – specifically, the double slash //.

We’d like to take a closer look. Would you please send in a bug report from the red error dataset? Leave the inputs/outputs undeleted, include a link to this topic in the comments, and post back here once sent in so we know when to look for it.

The indexes were recently reorganized for RNA-Star (older versions) plus new indexes were created for the latest version as available at usegalaxy.org. Other tests passed, so we are curious about what exact inputs/settings/parameters led to this type of error.

It doesn’t appear to be a problem with a reference annotation (gtf) input, but we can check that at the same time and provide feedback if there is some secondary issue. If the tool itself ran correctly, problems with gtf content would create a different type of error, or possibly just odd yet putatively successful results.

There is much prior Q&A about troubleshooting issues with reference annotation at this forum. Search with a keyword like “gtf” to find those. Help is also in this FAQ: https://galaxyproject.org/support/diff-expression/

Thanks!

cc @nate @dave

Thanks for sending the bug report in – reviewing.

Update 1: Other tests still running to address the path problem, but the gtf is also problem. The chromosome identifiers are a mismatch with UCSC’s rn6 genome chromosomes. The FAQ I linked before will help to get that corrected. UCSC has a version (linked in FAQ), as does Gencode for GRCm38/rn6 here: https://www.gencodegenes.org/mouse/. If you get the Gencode version, you’ll need to remove the header lines. Many topics cover it, this is a good one (refers to human but the same instructions apply to mouse): Wrong! Update 9/15/20: For rn6 UCSC does not have GTF in the right format (why explained here: https://galaxyproject.org/support/diff-expression/) and Gencode does not have it at all (only human/mouse are supported).

But iGenomes does – scroll down in the topic below for instructions about how to get the data into Galaxy from that data provider. Or, you may be able to convert the identifiers in the current GTF with this tool Replace column by values which are defined in a convert file (Galaxy Version 0.2). One source for “convert files” is linked on that tool form, scroll down into the help. You probably want this file Rnor_6.0_ensembl2UCSC.txt – and the “raw” URL for that data here should be good to paste into the Upload tool without any manipulations re format/datatype: https://raw.githubusercontent.com/dpryan79/ChromosomeMappings/b2862c4897cbe43f731e2cd6a2fdd2588b4e49b0/Rnor_6.0_ensembl2UCSC.txt. You can test either method out and use the resulting GTF with HISAT2 to see if it is complete/correct.

Update 2: Execution issues with RNA-Star are confirmed. An alternative tool is HISAT2. Issue ticket (will close out once fixed): https://github.com/galaxyproject/usegalaxy-playbook/issues/306

1 Like

My data is not of the greatest quality, but I wanted to create a work flow for other data I have coming soon

Thanks for working on this, just to clarify I’m working with the rat genome, not mouse

1 Like

The test cases included in the ticket were chosen for specific purposes related to the technical nature of the presenting problems (there is more than one factor involved).

So – another way of stating the goal is to get the most commonly used model genomes indexed, functional at the target clusters, and in the correct format for use with the latest version of the RNA-Star wrapper. Rat rn6 will be part of the priority genome set – I’ll add a note to the ticket to make that clearer.

Update 9/15/20 – Sorry, missed what you were referring to re mouse/rat. I updated the above help for better options: 1) the iGenomes source for a GTF based on rn6 UCSC identifiers and 2) how to possibly convert your existing GTF from Ensembl-to-UCSC chromosome identifiers. You’ll need to try those and see if it works or not (should). Maybe do both and compare, you may simply prefer the annotation content from one versus the other. :nerd_face:

1 Like

Update:

  • rn6 is now indexed for RNA-Star
  • more other genomes are also indexed
  • as new genomes are completed, these will become available in the tool form
1 Like

thank you for working on this. Is there a list of currently supported genomes?

1 Like

As genomes are indexed for this tool, each will appear in the reference genome target pull-down menu on the tool form. The prior linked ticket includes the priority genomes but others will be added – we are working on consolidating/merging genomes, indexes, and annotation across usegalaxy.* servers over the next several months.

If a genome of interest is not included for any tool as a built-in index, the best route is usually to use a custom reference genome (fasta supplied from the history). That can be promoted to a custom build to create a “database” unique to your account that can be assigned to datasets. (Some tools interpret the “database” metadata).

A custom genome + custom build works for nearly all tools. It is a particularly good choice for smaller genomes (bacterial, viral, etc). We will never index every reference genome for all tools. Or, not in the near term :woman_technologist:

The fasta formatting matters! FAQ with details: https://galaxyproject.org/learn/custom-genomes/

Best