A properly formatted
bed dataset contains a score in the 5th column with a single numerical value between 0-1000.
Other columns in bed format do contain comma-separated data (the size/start for “block” features, see the FAQ above).
If the goal is to extract bed lines that likely include a splice site (2 or more alignment blocks) – although that isn’t always the case, sometimes those gaps are not really splices – then you can filter on column 10 – the number of blocks.
One simple tool choice would be:
Filter data on any column using simple expressions. Use the expression
c10>=2, then adjust for headers if you have any. A header that is not excluded, that has any non-numerical characters in the 10th column, will fail the tool with this particular type of python function (numerical). Full examples are on the tool form for this and other types of python-friendly expression filters.
If that is not what you want to do, then try the tool
Select. That will accept most regular expressions but you’ll need to construct it in a way that filters values in the right column, using standard regular expression syntax (not python as with the first tool). That tool form also has help, plus there is a lot of regular expression help online – including web tools to test out/debug expressions quickly. Or just test in Galaxy with a few representative lines (keep + not), then process the whole dataset once it is working.
Hope that helps!