fatal error in STARSolo

Hello,
I got the following error when tried to map my scrna seq to a concatenated genome via STARSolo.

Fatal LIMIT error: the number of junctions to be inserted on the fly =5416911 is larger than the limitSjdbInsertNsj=1000000
Fatal LIMIT error: the number of junctions to be inserted on the fly =5416911 is larger than the limitSjdbInsertNsj=1000000
SOLUTION: re-run with at least --limitSjdbInsertNsj 5416911

Jan 26 05:12:14 … FATAL ERROR, exiting

how can I increase limitSjdbInsertNsj? or any other solution to get rid o this limit error?

What kind of data are you working with? What is your genome?

I am very surprised to see so many junction reads in a single dataset. Is it possible for you to split the data and map the portions separately?

Otherwise we would need to change the STARsolo wrapper to incorporate limits

The data I want to analyze is a single cell RNA sequence of lung bronchoalveolar cells.
Both my cDNA and barcodes reads are in fastqsanger.gz format.(105.7GB and 34.9GB)
How can I split them to small sequences without any overlapping and analyze them separately?

If you are certain that you have same number and order of reads in the cDNA as well as those in the Barcodes.

You could possibly use the Split File to Collection tool separately on your cDNA and barcode reads.

This would give you two collections of 10 items, which you will need to “zip” together into one paired-end collection of 10 items. (You can use the Zip Collection tool for this).

You can them map this collection to give you 10 count matrices from STARsolo, that you will have to concatenate together using the Import AnnData tool later.

1 Like

Thanks for your immediate response.
As I said my data is about 106 GB,and when I use splitter tools,the gz file is automatically uncompressed:
Job output file grew too large (greater than 125.0 GB), please try different inputs or parameters.
Is there any way to split my file without decompressing it?or replace the file with just one segment of it? or any website that split my data?