I’m using Barcode Splitter (v1.0.1) on usegalaxy.org but it fails every time with “Error: bad input file, expecting line with sequences”, “>10000 output datasets”, and exit code 255. The setup includes a barcode file with 12K unique barcodes in identifier[TAB]ACGT format alongside a FASTA input file containing ~9K sequence lines, which is an accepted format according to the tool. Even my 50-barcode test file fails with the identical “bad input file” error, which is particularly confusing since the barcode format matches the tool requirements exactly.
Can you help identify why even small tests fail or suggest alternatives for handling large barcode libraries?
Then, in general, you can try to check the fasta file to see if it has any format issues. The tool NormalizeFasta could also be used (please notice the extra toggle to split the title line – doing this might be needed!)
To check the tabular file, you could try cutting out those first two columns with the Cut tool. Or, use a tool like Count or Group or Unique to inspect the format further. More text manipulations tools are discussed in this tutorial. → GTN Materials Search. Most are the same as common command-line utilities, so if you know those, try a keyword search on the tool panel to check that way, too!
I hope this helps! If you need more help, you are welcome to generate a share link to your history and share that back! You can unshare after we are done. FAQ: Sharing your History
Thank you for the quick response and example history!
One clarification before I test: My library has 12K unique barcodes. I noticed the “Job generated >10000 output datasets” error. Is there a 10K output dataset limit in Barcode Splitter? If so, this exceeds my needs (12K barcodes required).
If limited, are there alternatives for large barcode libraries (>10K) that don’t hit the dataset limit? Local tools or different Galaxy servers with higher quotas?
Ah, thank you for clarifying. When you stated that the tests with smaller data didn’t work, I had misunderstood that you meant the target file to parse out, and not the barcodes themselves. The error code message now seems clear!
Now, that correction handled the error trapping better, so whether this allows your specific job to complete successfully or not is difficult to predict. Given that the FastX Barcode Splitter itself doesn’t have a hard limit, this seems worth a try. → https://bio.tools/cshl_princeton_fastx_barcode_splitter
However I have a small warning: the web application version of Galaxy has to respect all of the basic limitations of every web browser! This makes navigating 12k datasets not very practical. You can try, but I would suggest running a filter on the output collections before attempting to navigate the results. Meaning, avoid “clicking on the raw collection” and instead parse it with a filter first, then navigate those results with content (with your sequences sorted by barcode, plus the unmatched set).
Filter empty datasets from a collection
Please give this a try and let us know how it works! Great stress test, so I am curious!