Scanpy Find Marker Genes groupby formatting

I am looking to find differentially expressed genes in the disease severity sample groupings within the cell type clusters (scRNAseq with scanpy) rather than across all clusters. Using two separate Strings in the groupby section yields an error, so how can this specific subgrouping be specified through a file?

Hi @Kashish_Kumar

Did you try use the file option for groupby instead yet? I think that that field directly on the form requires a single term. Or, it may require a dash between the values. Not sure but the tutorials here will probably help.

https://training.galaxyproject.org/training-material/search2?query=scanpy

Thank you for your response! Yes, I am unable to find the information on the format/parameters I should pass through the file groupby.

Hum, the file is probably one key/value per line.

The direct input seems to be comma separated.

Many ways to slice up and label data, then to reference the same for calculations/plots, are covered in this specific tutorial: Clustering 3K PBMCs with Scanpy

I recieved Keyerrors for both separating by a comma and the line-separated file. The tutorial does not specify a method for what I am aiming for. I will try to separate it based on groups, but I don’t think this will provide cell-specific marker DE genes.

hi @Kashish_Kumar

The tutorials are just examples of converting over methods from the tool developer – or some publication – into the Galaxy version of the tools. You can do this directly, too.

Meaning, the underlying tools are the same, Galaxy just puts a GUI on top of it. Most functions are usually available, and if not for some reason, the help section will usually state why.

This sounds similar to what you are trying to do: Visualizing marker genes — Scanpy documentation

Thank you, I have changed some of the “Advanced Settings” inputs so the command running is the following. This might result in what I require. I need DE genes expression tables rather than plots.

scanpy-find-markers --save diffexp.tsv --n-genes ‘500’ --groupby ‘CoVID-19 severity’ --method ‘t-test_overestim_var’ --use-raw --groups ‘celltype’ --reference ‘rest’ --filter-params ‘min_in_group_fraction:0.25,max_out_group_fraction:0.5,min_fold_change:2.0’ --input-format ‘anndata’ input.h5 --show-obj stdout --output-format anndata output.h5

1 Like