trying to turn single cell matrix into anndata

I’m trying to turn a barcode list, gene list and .mtx file into an anndata file with scanpy read10x and it keeps failing, and I don’t understand the error message it’s throwing up.

Traceback (most recent call last):
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/bin/scanpy-read-10x”, line 10, in
sys.exit(READ_CMD())
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/click/core.py”, line 829, in call
return self.main(*args, **kwargs)
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/click/core.py”, line 782, in main
rv = self.invoke(ctx)
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/click/core.py”, line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/click/core.py”, line 610, in invoke
return callback(*args, **kwargs)
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/scanpy_scripts/cmd_utils.py”, line 48, in cmd
adata = func(**kwargs)
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/scanpy_scripts/lib/_read.py”, line 24, in read_10x
adata = sc.read_10x_mtx(input_10x_mtx, var_names=var_names)
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/scanpy/readwrite.py”, line 481, in read_10x_mtx
adata = read(
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/scanpy/readwrite.py”, line 524, in _read_legacy_10x_mtx
adata.var_names = genes[0].values
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/anndata/_core/anndata.py”, line 891, in var_names
names = self._prep_dim_index(names, “var”)
File “/usr/local/tools/_conda/envs/__scanpy-scripts@1.1.3/lib/python3.9/site-packages/anndata/_core/anndata.py”, line 806, in _prep_dim_index
raise ValueError(
ValueError: Length of passed value for var_names is 25302, but this AnnData has shape: (118845, 25301)

If anyone could give me a hand with this I’d be so grateful!

Quick guess: The difference is “one” between those values. Could there be a stray header line? Or, an extra/empty newline somewhere?

Tutorials https://training.galaxyproject.org/training-material/topics/single-cell/

If you need more help solving this, please post back a share link to your history. Leave all inputs and outputs undeleted, or this will be difficult to review. Troubleshooting errors. If you are already following a tutorial, please also share the link back and note which step you are on (why? more community members will be able to help).


Update: I found your history link at the top, sorry missed it first time. The problem does appear to be an extra line at the top of dataset 14 (gene list). Remove that, and try a rerun please.

Data Manipulation Olympics → See the cheatsheet. The tool “Remove the beginning of a file” is the simplest choice.

Examples of data like yours are in this tutorial: Clustering 3K PBMCs with Scanpy