Filter data on any column using simple expressions

salvatore_d · February 15, 2019, 1:38pm

Hi all.
i have uploaded a file in wich I have to select the last two columns. I tried with the tool “Filter data on any column using simple expressions”, but evidently I’m missing something, because I don’t get what I want (it selects a single row).
At beginning I tried to select only one column (relatively the p value), writing c12==‘pval’ (since in the original table it is in the 12th column). Unfortunately it doesn’t work.
Suggestions?

Thank’s.

innovate-invent · February 15, 2019, 7:16pm

Can you provide a sample record?

jennaj · February 15, 2019, 9:07pm

It sounds like you are filtering terms in the header and expecting the entire column under that header to be returned as a result. This filter tool won’t do that.

Try using Cut instead. It allows you to choose and/or rearrange entire columns of data.

If that isn’t what you want to do, please explain a bit more and do share a small sample of your data. There are many, many ways to filter data in Galaxy.

MzwaneleN · March 23, 2021, 11:00pm

Hi. I would like to filter a bed, score column (c5). I want to keep lines that have characters that are separated by one or more commas eg.: 1,2; 1,2,3. What expression can I use? Thanks.

jennaj · May 14, 2021, 3:19am

Update! See our Text Manipulation tutorials

All → https://training.galaxyproject.org/training-material/search2?query=olympics
General Tools (great for custom workflows) → Hands-on: Data Manipulation Olympics / Data Manipulation Olympics / Foundations of Data Science
R → Hands-on: Data visualisation Olympics - Visualization in R / Data visualisation Olympics - Visualization in R / Foundations of Data Science
SQL → Hands-on: Data Manipulation Olympics - SQL / Data Manipulation Olympics - SQL / Foundations of Data Science
JQ → Hands-on: Data Manipulation Olympics - JQ / Data Manipulation Olympics - JQ / Foundations of Data Science

Hi @MzwaneleN

A properly formatted bed dataset contains a score in the 5th column with a single numerical value between 0-1000.

*Datatypes - Galaxy Community Hub

Other columns in bed format do contain comma-separated data (the size/start for “block” features, see the FAQ above).

If the goal is to extract bed lines that likely include a splice site (2 or more alignment blocks) – although that isn’t always the case, sometimes those gaps are not really splices – then you can filter on column 10 – the number of blocks.

One simple tool choice would be: Filter data on any column using simple expressions. Use the expression c10>=2, then adjust for headers if you have any. A header that is not excluded, that has any non-numerical characters in the 10th column, will fail the tool with this particular type of python function (numerical). Full examples are on the tool form for this and other types of python-friendly expression filters.

If that is not what you want to do, then try the tool Select. That will accept most regular expressions but you’ll need to construct it in a way that filters values in the right column, using standard regular expression syntax (not python as with the first tool). That tool form also has help, plus there is a lot of regular expression help online – including web tools to test out/debug expressions quickly. Or just test in Galaxy with a few representative lines (keep + not), then process the whole dataset once it is working.

Hope that helps!

Topic		Replies	Views
How to delete rows with the same number in the bed file in usegalaxy bed , usegalaxy	2	954	December 11, 2020
Filter in Galaxy usegalaxy.org support data-manipulation , filter , tool-help , filter1	2	688	September 3, 2020
Filter tabular data columns by arbitrary list usegalaxy.org.au support text-manipulation	4	227	March 14, 2024
Filtering issue, help needed usegalaxy.org support server-admin	3	456	March 1, 2020
convert fastq to columns pvalue, logFc2 compute , tool-help , compute-on-rows	1	114	May 28, 2024

Filter data on any column using simple expressions

Related topics