Failed to activate conda environment! - Mapping step

Hi everyone,

I am new to Galaxy and I am facing a conda environment problem lately on the usegalaxy.org Main server.
I am trying to map my data on the human genome using the RNA STAR tool but whenever I execute my jobs on the usegalaxy.org server, I have this error:

Failed to activate conda environment! Error was:
/jetstream/scratch0/main/jobs/41688358/command.sh: line 110: /cvmfs/main.galaxyproject.org/deps/_conda/bin/activate: No such file or directory

and all of my files are empty.

I tried to do the same mapping step with HISAT2 and TOPHAT tools but I still face the same error.

However, I tried to follow the galaxy tutorial on RNA-seq analysis Hands-on: Hands-on: Reference-based RNA-Seq data analysis / Transcriptomics and I do not have that problem.

Can someone tell me if I am doing something wrong or is it problem occurring within the server?

Thank you!

It is possible that the server has a problem, perhaps the directory really does not exist

1 Like

Hi @MPG

If the tutorial data works, but your data does not, with the same tool – that indicates a problem with your data.

Questions:

  1. Have you tried a rerun yet? This error usually occurs for a very small fraction of jobs and is not reproducible (small cluster hiccup). But I’ve also seen problematic inputs be the root reason when reproducible.
  2. If a rerun also fails…
    • Are you running the most current version of the mapping tool(s)? Which? Please post back the tool names and versions for each, or better, try using the latest version(s) first, then post back which were used/failed.
    • Does a tool like FastQC execute properly against the fastq input datasets?
    • How are the read data organized? Single end or paired? Individual datasets, multiple datasets, or a dataset collection?
    • Are you using a built-in indexed for the mapping (available in the drop-down menu on the tool form) or using a custom reference genome (fasta from the history)?

Let’s start there. The reruns/read QA you should first, since it might find uncover the problem. And if not, we’ll ask you to share the history with those test runs in it, review the details, and try to help more.

Hi @jennaj,

Thank you for your answer.

1. Have you tried a rerun yet? This error usually occurs for a very small fraction of jobs and is not reproducible (small cluster hiccup). But I’ve also seen problematic inputs be the root reason when reproducible.

Yes, I tried to rerun the mapping but I still have errors.

  1. If a rerun also fails…

Are you running the most current version of the mapping tool(s)? Which? Please post back the tool names and versions for each, or better, try using the latest version(s) first, then post back which were used/failed.

I am using the tools available on the usegalaxy sever:

RNA STAR Gapped-read mapper for RNA-seq data (Galaxy Version 2.7.8a+galaxy0)
HISAT2 A fast and sensitive alignment program (Galaxy Version 2.2.1+galaxy0)
TopHat Gapped-read mapper for RNA-seq data (Galaxy Version 2.1.1)

This is the error I get for each tool:

Does a tool like FastQC execute properly against the fastq input datasets?

Yes, I started with a FASTQC quality control step, trimmed my adapters with the trim galore! tool (output format: fastqsanger) then reran FASTQC/MultiQC to check if my files were ready to use for mapping and every step executed properly.

How are the read data organized? Single end or paired? Individual datasets, multiple datasets, or a dataset collection?

I have paired-end reads and I select multiple datasets.

Are you using a built-in indexed for the mapping (available in the drop-down menu on the tool form) or using a custom reference genome (fasta from the history)?

Yes, I am using a built-in reference genome (the human genome (hg38)) without a built-in gene model. As for the gene model, I specify .gtf file (GRCh38) downloaded from the ensemble database. Then, I specify the length of the genomic sequence around annotated junctions (ReadLength-1) and I execute my job.

I just want to specify that the tutorial I am following treats a different genome (dm6) and I had no problem with the files.

So I do not know if I am using the wrong file formats or not the adequate tools but I hope someone can enlighten me on this matter.

Thank you.

Hi @MPG

Check these two items. The first may or may not be OK (worth checking), and the second is likely a content problem (mismatched chromosome names). Both might need to be adjusted.

Make sure that the pairs are entered in the same order on the tool form, and that none of the ends of any pairs are omitted. You can click into the “i” Job Details page to see what was originally submitted to check (even for failed jobs). If you have many paired end datasets, using dataset collections would help: Search Tutorials

The reference genome was sourced from UCSC when we indexed it on the server.

All other inputs must use that same genome/build, including the same chromosome naming formats, or tools will either ignore the mismatched input or trigger a failure.

The reference annotation (gene model) was sourced directly by you… from Ensembl (?). They use a different chromosome name format than UCSC. Where to get the annotation with UCSC chromosome names is in this prior Q&A:

Nothing else pops out. So, please try using the UCSC GTF instead with rerun(s). If any still fail, please send in a bug report from one of the red error datasets, and include a link to this topic in the comments so we can link the two for context. Also, please leave all datasets undeleted (inputs + outputs, successful or failure) for review.

Thanks!