Need help with single-cell RNA-seq analysis workflows

Hello,

Thank you so much for this wonderful community. I am the admin for a local galaxy instance. I have been struggling with running my single-cell workflows. I built two workflows - one is primary workflow and is based on this workflow - Hands-on: Pre-processing of 10X Single-Cell RNA Datasets / Pre-processing of 10X Single-Cell RNA Datasets / Single Cell. The first tool within the first workflow that caused a problem was DropletUtils which didn’t work and it gave this error -

Error in .local(m, ...) : unused argument (alist(... = )) Calls: do_default_drops -> do.call -> do.call -> <Anonymous> -> <Anonymous> Execution halted

I decided to then remove this tool and just focus on creating a successful workflow. The next tool that caused a problem was the “sceasy” tool which is used for converting RNAStarsolo output to AnnData format. This format was what I was using for the second workflow. This workflow is based on this - Hands-on: Filter, plot and explore single-cell RNA-seq data (Scanpy) / Filter, plot and explore single-cell RNA-seq data (Scanpy) / Single Cell. This was mainly because the Seurat workflows do not have the option to filter on percent of mitochondrial counts, so I decided to construct the workflows with ScanPy which needs AnnData format. The sceasy tool in the first workflow causes this error -
Loading required package: reticulate Error in py_module_import(module, convert = convert) : ModuleNotFoundError: No module named 'loompy' Run `reticulate::py_last_error()` for details. Calls: <Anonymous> -> py_module_import Execution halted

I removed that tool and used “Import AnnData and Loom,” and the job didn’t show an error and just showed “An error occurred with this dataset format h5ad database ?”. Finally, I decided to convert the datatype of RNAStarsolo output to loom format and use that (under “change datatype”). I don’t know if that’s the correct solution. If someone could shed light on this, I would be very appreciative.

Now, regarding the second workflow, I wanted to use “Plot with scanpy” and “Scanpy PlotEmbed” to obtain plots for the workflow. None of the scanpy visualizations worked. “Plot with scanpy” failed with this error -
import scanpy as sc ModuleNotFoundError: No module named 'scanpy'
This is weird because it’s able to successfully run other functions like “ScanPy ScaleData”, “ScanPy RunPCA”, etc.
On the other hand, “Scanpy PlotEmbed” fails with this error -

Usage: scanpy-cli plot embed [OPTIONS] <input_obj> <output_fig> Try 'scanpy-cli plot embed --help' for help.Error: Invalid value for '--legend-loc': invalid choice: right. (choose from right margin, on data). However, in the options, I have selected right margin. I don’t know how to solve this issue.

I decided to swap out “Plot with ScanPy” with “Plot with Seurat,” and it failed with this error -

line 10: seurat-plot.R: command not found

I am not able to solve these errors even after several swaps and parameter switches. I need to demonstrate this workshop for the organization and if anyone could give me any suggestions, I would highly appreciate it.

Thank you,
Priyanka

Hi @Priyanka_Bhandary

Feedback below

This appears to be a mismatch between the data input and a parameter setting. Maybe the defaults are not appropriate? There is a lot of variation for these file formats, so some investigation is probably needed on your part to match these up.

This looks like a tool dependency problem. Try using the “managed dependencies” option when re-installing the tool.

Hard to guess – but one of those first two reasons seems like a good place to start. Meaning, install with managed dependencies, then confirm the parameters fit your data.

If the format was changed to match what this tool is expecting versus the loom parameter settings, and you have an output you can use with downstream tools, then great! To be clear: changing the datatype is usually not correct but converting the datatype (producing a new file in your history) is pretty common. Just keep in mind that tools only know about the data and parameters that you provide, and can do all sort of odd things when those are not true (and won’t always fail, just produce odd results). So, double check your output.

This seems like another dependency problem (probably a python version conflict). Not each function for a tool may run into the same need for a specific dependency.

Did you install the most current version of the tool? Is it the same as the version hosted at UseGalaxy servers? If not, try using the same version and see if that helps. If you can reproduce the problem at one of those servers, I would be interested in reviewing since it sounds like a tool form bug. Start up a new history, with some very simple test data (maybe tutorial data), trigger the error, then come back and share the history here and I can help to get this reported to the tool developers for a fix.

This is another dependency problem (python or maybe R versioning problem).

Overall, I think your problems are with the administration of the server (how job run) and how the tools were installed to start with. You will want to be running the jobs in “self-contained” container environments, installing the tools with managed dependencies, and making sure that jobs are using the Galaxy paths to all tools and dependencies (not system paths to local tools).

How to do all of the set up is here → Private Galaxy Servers. And, if you are in a rush, you could consider using the Docker version listed in that guide since all of this is mostly set up already – you’d just need to install any tools that happen to be missing. The interface is older but you can certainly run workflows in there. It was intended to be for uses like you are describing: demonstrations, workshops, etc.

Let’s start there. Thanks :slight_smile:

3 Likes

Thank you so much for the detailed reply @jennaj. That is really helpful. We finally resolved all issues except one major one. In my primary workflow, I used RNAStar solo and then used Import AnnData and Loom tool and obtain the AnnData output. I want to use Plot with ScanPy tool to plot the different quality control metrics which are described under the filtering part in this tutorial - Hands-on: Filter, plot and explore single-cell RNA-seq data with Scanpy / Filter, plot and explore single-cell RNA-seq data with Scanpy / Single Cell. In the tutorial, they use their own dataset and use x and y coordinates from there. However, an object that is created from my tutorial doesn’t have the variables (log1p_total_counts, pct_counts_mito) that are needed to create these plots. How do I calculate these for my own data? This is not mentioned anywhere, and I need these plots for user-generated data. Thank you so much for your prompt and detailed replies.

Thank you,
Priyanka Bhandary

The tutorial was plotting the data that was available in the example AnnData object file. Your own objects would contain any of the variables that you specifically added or annotated with. And the names/labels for variables are not fixed – you can name those whatever you want to.

So – this becomes a really big question!

  • Do you need to do more annotation? Is that information possible to generate based on the current data? Why or why not? Is there a tool can can add in what you need? Is there a tool that can generate new metrics?
  • Think about what you want to later compare in your plots, then add those metrics in so you can refer to them when generating downstream plots.

A good resource is the manual for Scanpy. Any manipulation described there you should be able to do in Galaxy. It is too complicated to list out here but if you scroll down to the bottom of the tool form, you’ll see links to the different sections (or you can just start over at the top after clicking into those).

Once you have some functions that you want to apply, try searching in the tool panel with that function name – both at a public Galaxy server and your own server – you might find more tools that you want to install.

If you have trouble mapping over a function to Galaxy you can ask a followup question and we can try to help. Please include the references again for context: link to the Scanpy docs, full name of the tool in Galaxy (if you found a candidate), and explain what is going wrong.

Hope this helps!

1 Like

Thank you so much for the detailed instructions, @jennaj. This helped a lot. This tutorial helped me create the steps as well - Hands-on: Clustering 3K PBMCs with Scanpy / Clustering 3K PBMCs with Scanpy / Single Cell.
I have two more questions for you
Now, for the output from RNAStar Solo, I want to convert the expression matrix into a TSV format to input it into a third-party R shiny application. Is there a tool that can do this?
The second question is whether there is a way to merge samples in the workflow. Right now, all the analysis I am doing is per samples. Everything is being done per sample, from the QC metrics plots to clustering to tSNE/UMAP plots. If you could provide some feedback on how to merge the samples, that would be very helpful. I use ScanPy functions for downstream processing and RNAStar solo for demultiplexing and quantification. Thank you!

Thank you,
Priyanka