Hello, I’m trying to figure out the galaxy way of subsetting columns by an arbitray list of column names.
E.g. given a tabular file with irregular column names like so:
gene A B sampleC D
geneA 1 2 2 2
geneB 0 0 0 4
How could I pull out colums, assuming I have them in a single column text file like so; (Or a comma separated string )
gene
A
D
Yeilding
gene A D
geneA 1 2
geneB 0 4
The issue is that the list of columns to keep can change, can be of arbitrary length, and can’t be hardcoded into knowing its the 1,2,5 column. This is to be part of a workflow.
I can’t find a tool that does this directly - perhaps I’ve missed it? (please do tell me I’ve missed it :))
My thinking is I could do it something like the following:
- Melt into long format with the ‘Table Compute’ (melt) tool
- Do a inner join of long format with the desired column list with ‘Join two Datasets side by side on a specified field’
- ‘Pivot’ the filtered table wider with ‘Table compute’ (pivot)
- But, how do I then put the columns back in a certain order?
a. Maybe in this case I could use ‘column arrange’ to get ‘gene’ up front if I don’t particularly care about the rest.
b. Is there a general solution, like if I just wanted to match the order of my columns-to-keep?
But that seems somehwat convoluted, so I think I’m missing something obvious? Can anyone point me in the right direction please?
Thanks,
Sarah.