Barcode Splitter fails consistently - 12K barcodes

Hi,

I’m using Barcode Splitter (v1.0.1) on usegalaxy.org but it fails every time with “Error: bad input file, expecting line with sequences”, “>10000 output datasets”, and exit code 255. The setup includes a barcode file with 12K unique barcodes in identifier[TAB]ACGT format alongside a FASTA input file containing ~9K sequence lines, which is an accepted format according to the tool. Even my 50-barcode test file fails with the identical “bad input file” error, which is particularly confusing since the barcode format matches the tool requirements exactly.

Can you help identify why even small tests fail or suggest alternatives for handling large barcode libraries?

Thank you!
Yael

Hi @Yael_Gershon

I agree, it seems like there is something about the file format that the tool isn’t understanding well!

I just recreated some examples in this shared history if you would like to compare

Then, in general, you can try to check the fasta file to see if it has any format issues. The tool NormalizeFasta could also be used (please notice the extra toggle to split the title line – doing this might be needed!)

To check the tabular file, you could try cutting out those first two columns with the Cut tool. Or, use a tool like Count or Group or Unique to inspect the format further. More text manipulations tools are discussed in this tutorial. → GTN Materials Search. Most are the same as common command-line utilities, so if you know those, try a keyword search on the tool panel to check that way, too!

I hope this helps! If you need more help, you are welcome to generate a share link to your history and share that back! You can unshare after we are done. :slight_smile: FAQ: Sharing your History

Hi @jennaj,

Thank you for the quick response and example history!

One clarification before I test: My library has 12K unique barcodes. I noticed the “Job generated >10000 output datasets” error. Is there a 10K output dataset limit in Barcode Splitter? If so, this exceeds my needs (12K barcodes required).

If limited, are there alternatives for large barcode libraries (>10K) that don’t hit the dataset limit? Local tools or different Galaxy servers with higher quotas?

Thanks again,
Yael

Hi @Yael_Gershon

Ah, thank you for clarifying. When you stated that the tests with smaller data didn’t work, I had misunderstood that you meant the target file to parse out, and not the barcodes themselves. The error code message now seems clear!

This was captured by our error trapping at a higher level, and a correction has already been applied. This was applied at UseGalaxy.org as of later last Friday, and will flow out to the other servers as they update to the new release 26.0 Galaxy Release (March 2026) — Galaxy Project 26.0.1.dev0 documentation.

Now, that correction handled the error trapping better, so whether this allows your specific job to complete successfully or not is difficult to predict. Given that the FastX Barcode Splitter itself doesn’t have a hard limit, this seems worth a try. → https://bio.tools/cshl_princeton_fastx_barcode_splitter

However I have a small warning: the web application version of Galaxy has to respect all of the basic limitations of every web browser! This makes navigating 12k datasets not very practical. You can try, but I would suggest running a filter on the output collections before attempting to navigate the results. Meaning, avoid “clicking on the raw collection” and instead parse it with a filter first, then navigate those results with content (with your sequences sorted by barcode, plus the unmatched set).

  • Filter empty datasets from a collection

Please give this a try and let us know how it works! Great stress test, so I am curious! :slight_smile:

Hi @Yael_Gershon

If you are still around – how did this work out in the end? Were you able to process all 12k barcodes?

Nice discussion