Filter tabular data columns by arbitrary list

Hello, I’m trying to figure out the galaxy way of subsetting columns by an arbitray list of column names.

E.g. given a tabular file with irregular column names like so:

gene	A B	sampleC D
geneA	1	2	2	2
geneB	0	0	0	4

How could I pull out colums, assuming I have them in a single column text file like so; (Or a comma separated string )

gene
A
D

Yeilding

gene	A D
geneA	1	2
geneB	0	4

The issue is that the list of columns to keep can change, can be of arbitrary length, and can’t be hardcoded into knowing its the 1,2,5 column. This is to be part of a workflow.

I can’t find a tool that does this directly - perhaps I’ve missed it? (please do tell me I’ve missed it :))

My thinking is I could do it something like the following:

  1. Melt into long format with the ‘Table Compute’ (melt) tool
  2. Do a inner join of long format with the desired column list with ‘Join two Datasets side by side on a specified field’
  3. ‘Pivot’ the filtered table wider with ‘Table compute’ (pivot)
  4. But, how do I then put the columns back in a certain order?
    a. Maybe in this case I could use ‘column arrange’ to get ‘gene’ up front if I don’t particularly care about the rest.
    b. Is there a general solution, like if I just wanted to match the order of my columns-to-keep?

But that seems somehwat convoluted, so I think I’m missing something obvious? Can anyone point me in the right direction please?

Thanks,
Sarah.

Hi Sarah,
what about transpose > join > transpose
Kind regards,
Igor

2 Likes

Thanks Igor - Yes - transpose should do the trick!

And I’ll use column arrange to get ‘gene’ back up front.

(side thought - wouldn’t a galaxy table manipulation cheat sheet be nice: a la https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf or https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf )

1 Like

Hi @swbioinf – jumping in – but that would be a fantastic addition to the “Data Olympics” series we have already going at the GTN :cowboy_hat_face:

See here for what we have so far. You could template off any. → GTN Materials Search

And here for how to contribute → Contributing to the Galaxy Training Material / Tutorial List. The GTN people would help (most are also community volunteers), and you’d get full attribution with stable public resource links!

1 Like

Thanks jennaj - That cheatsheet part of that tutorial was exactly the sort of thing I’m looking for.

I’ll probably do a few things along this line (trying to automate some manual manipulation rubbish), so will keep notes.

2 Likes