I want to create a workflow which includes the “Advanced Cut” and “Merge Columns” tools. To define which columns should either be cut or merged a “data_column” input is required while for other tools a text input via “Parse parameter value” and “Compose text parameter value” is used. But this route and just to use a txt or tabular output does not work because it is not recognized as “data_column input”. I would be grateful if someone could tell me how to make a “data_column input” out of a tabular file in the galaxy workflow.
Addition: I also have issues with the “cut” command in the workflow when it takes “Compose text parameter value” as input. It always only cuts the first defined column. If I delete and re-write the last defined column in the history and re-run it works as intended. Is there any way to avoid that?
I guess I could use awk but am not familiar with it. But of course, it would be the most convenient way.
Yes, for all the tools mentioned below I click the double-edged arrow :).
My situation is the following: I have input datasets which differ in size. Because of that I am calculating the columns which have to be cut from a prepared file, cut the relevant column, concatenate these datasets (step 1). That results in a tabular (or any kind I can select in the configure output section) file which contains the column numbers which have to be cut in each row in column 1 of that file (step 1).
For using cut I then prepare this file to be the input for “cut columns” input of the cut command (step 2). As explained above, this somehow does not work. It always only cut the first defined column (error 1). When I go to the history, click re-run, delete one number, type it in manually and run the cut command manually it works. So I thought I can use advanced cut but can’t get the right format for the input file which has to be “data_column” from that concatenated list of columns I need to cut (error 2).
After I cut the columns, I would like to merge a subset of them which is als1 defined by step 1. But as in error 2, the inputs requires to be a “data_column” (error 3).
Using gawk might be the best way but I am still curious, I how I can transform a tabular file into the “data_column” in case I am facing the same issue in the future.
Thanks a lot for any suggestions on how to solve that .
Hi @Ben
This is very complicated!
As for the first issue, only one column was removed by Cut. I cannot reproduce it. I created a small workflow and it cuts any number of columns. For example, when I use c3,c2 as a variable, it cut and swaps columns 2 and 3. The workflow: Galaxy | Australia
History: Galaxy | Australia
Dataset #2 was the input, dataset #5 - output. Ignore #10 and #34.
I also cannot get a right “columns to cut” format for Advanced Cut . Looks like a bug.
gawk/awk part: do you have two input files, one with column IDs, and the file to be transformed, right? This is interesting question. While I can transform tabular file into a text string suitable for Cut, the txt datatype is not compatible with column input for Cut.
HI Ben,
I created a workflow that cuts columns using values from the 1st column of a tabular dataset. It creates a string compatible with Cut using awk from numbers in column1. Datatype of the awk output was changed to “txt”. Are you after something like this?
History Galaxy | Australia
Workflow Galaxy | Australia
For arbitrary columns merge awk might be a good option. For example
{print $1, $2, $1 $2} creates three columns file with c1, c2 and merged c1 and c2. If you need a space as a separator in merged columns {print $1, $2, $1 " " $2} should do the job.
If you are after merge of all columns, maybe consider using Replace (replace tab with nothing?). It works on txt datatype, so in workflow change datatype from tabular to txt.
I cannot figure out how a workflow can handle different number of columns for Column Merge.
I tried to use your first workflow but it did not work with a tabular input. It’s a bit weird because if I re-run the job everything looks good but it still only cuts the first column.
But your second workflow works .
Concerning merging columns: It also works well with the awk code.
Thanks so much! I will implement it both suggestion!
I’m glad it works for you. Thank you for the interesting question.
If you don’t like awk, an alternative option can be:
cut the 1st columnw using Cut
add “c” in front of all numbers using Replace
transform a single column into a single line table using Transpose
(in this step change the output datatype to “txt”)
substitute tabs with commas using Replace
(you can remove empty spaces as well in this step)
Parse parameter value
It should produce the same functional text string as awk in the last workflow.
I don’t know awk, but I like it. The script prints the content of the 1st column ($1) and adds “c” in front of each value. In awk, the default Output Record Separator is a new line, but I used ORS=“,”, so, instead of a singe column, all values were recorded in a single line separated by commas. This is similar to the data transformation described above.
Thanks for the alternative route, but awk works well for me.
I actually use awk now to merge columns which are defined in a tabular file (each row contains a column number. To do that I am first using the awk code “{printf “$” NR “";"” } END {print “”}” then on the output another awk code “{print "{print " substr($0, 1, length($0)-3) “}”}”. This output I am putting into “parse parametr value” (text_param as output) and am using that as awk program input for awk text reformatting of the initial data. This works well.