Error of inspect anndata

Hi,

I am trying to do single-cell data analysis, but when import data with " Import Anndata and loom from different format" and then do “the full data matrix” inspect with " Inspect AnnData object", I always get the ERROR, I help you can help me to settle my problem, you can view and import my History by visiting the following URL: Galaxy

Hi @webyoung

Was the error generated in this same history? The error in dataset 5 is recorded as using a “dataset 5” for the input.

Rerun is in here – it is using dataset 4 as a input, the only valid input in this history. https://usegalaxy.eu/u/jenj/h/imported-wfq-httpshelpgalaxyprojectorgterror-of-inspect-anndata8783-10-4

The rerun did fail, and the error message includes this. It means that the data is actually too large to parse or that there is a format problem.

MemoryError: Unable to allocate array with shape (6794880, 31053) and data type float32

The parameter for the mtx version might be the problem. I started up a rerun for the h5ad creation (with an option modified to match the version header in dataset 3) then the downstream tool you had a failure with. Might reveal more about what is going on.

Write back when you are done with the history I shared back and I’ll purge that copy.

Actually, I copied my step of running and pasted in the history, but it is the problem what I have met.

Ok, that is what it looked like. Next time you share a history for help, please leave all of the inputs and outputs used for the original job in the same history. That might mean you need to copy the inputs into a new history – but one rerun is always recommended anyway. You could also clone the entire original history, purge anything not related to the question, then share.

Sometimes the technical details matter, especially when the error message is so odd that some help/second opinion is wanted. Leaving everything intact is how others can best help here.

The test jobs are still running in my copy of the history. I’ll keep leaving it shared for now – unless you resolved the problem already? I think that the mtx version was set incorrectly. It is a bit tricky – there is a file version and a software version noted in the original header. The tool form just states “version”. That related to the software version as far as I know, and is what the tests are for primarily. The tests also include generating some basic statistics, not just the full tabular output that you originally wanted. If the statistics can be generated but the full output cannot be, that means the file is too large to process all at once. But that isn’t confirmed yet.

OK, next time, I am going to share a history for help just as you suggested. I noticed the output that you have run also showed error, I am looking forward to seeing a good method to settle this problem.

Hi @webyoung

The tests are done now.

  1. Creating an object with all of the raw data is too large to process. This might be something that can be adjusted. Send in a bug report from your error (a failed run in the same history with the exact inputs actually used) and the EU admins can investigate options. In short, the tool is being sent to a cluster node that isn’t large enough to process the fully expanded data array.

  2. Pre-filtering the raw data first does create data that works with this tool, and others, and is what you will likely want to do anyway for practical analysis reasons (quality). I filtered using defaults – but you can tune that.

  3. The original history has an example of that now and the sc-RNA tutorials here have many more details.

Thank you very much for helping me. This means what I have done is nothing wrong, the reason is just the inspect anndate can not handle very large data, I need to wait until somebody can improve the tool, is what I understood correct or not?

The public server where you are working cannot handle the job with the current resource allocated given to this tool. And, any practical utility of the full raw matrix in tabular format could be debated. The need to create or use fully expanded “all versus all” massive files is exactly what the sparse matrix formats are intended to solve.

The EU admins may allocated more resources or not because even that may not be enough or practical. What you can do is ask via a bug report. While waiting for feedback, you’ll need to quality filter the raw data anyway to work with it more. That gets rid of a LOT of non-informative bulk information from the data. The smaller informative datasets process fine. Examples are in the test history I shared – and the GTN tutorials plus other public resources have more examples/methods.

Thank you very much for helping me. I am going to do as you suggested.

1 Like

[quote=“jennaj, post:8, topic:8783”]
Thank you very much for helping me. I am going to do as you suggested.