Filter data on any column using simple expressions

Hi all.
i have uploaded a file in wich I have to select the last two columns. I tried with the tool “Filter data on any column using simple expressions”, but evidently I’m missing something, because I don’t get what I want (it selects a single row).
At beginning I tried to select only one column (relatively the p value), writing c12==‘pval’ (since in the original table it is in the 12th column). Unfortunately it doesn’t work.
Suggestions?

Thank’s.

1 Like

Can you provide a sample record?

1 Like

It sounds like you are filtering terms in the header and expecting the entire column under that header to be returned as a result. This filter tool won’t do that.

Try using Cut instead. It allows you to choose and/or rearrange entire columns of data.

If that isn’t what you want to do, please explain a bit more and do share a small sample of your data. There are many, many ways to filter data in Galaxy.

Hi. I would like to filter a bed, score column (c5). I want to keep lines that have characters that are separated by one or more commas eg.: 1,2; 1,2,3. What expression can I use? Thanks.

1 Like

Hi @MzwaneleN

A properly formatted bed dataset contains a score in the 5th column with a single numerical value between 0-1000.

*Datatypes

Other columns in bed format do contain comma-separated data (the size/start for “block” features, see the FAQ above).

If the goal is to extract bed lines that likely include a splice site (2 or more alignment blocks) – although that isn’t always the case, sometimes those gaps are not really splices – then you can filter on column 10 – the number of blocks.

One simple tool choice would be: Filter data on any column using simple expressions. Use the expression c10>=2, then adjust for headers if you have any. A header that is not excluded, that has any non-numerical characters in the 10th column, will fail the tool with this particular type of python function (numerical). Full examples are on the tool form for this and other types of python-friendly expression filters.

If that is not what you want to do, then try the tool Select. That will accept most regular expressions but you’ll need to construct it in a way that filters values in the right column, using standard regular expression syntax (not python as with the first tool). That tool form also has help, plus there is a lot of regular expression help online – including web tools to test out/debug expressions quickly. Or just test in Galaxy with a few representative lines (keep + not), then process the whole dataset once it is working.

Hope that helps!