Create a new column in a table with specific patterns

Hi!
I have a csv file having two columns with the following series of values:

Column1  Column2
3,4,5    7,8,9

which sequence of tools can I use in Galaxy to create a new column merging the values of Column1 and 2 in this specific format:

     Column3
(3,7),(4,8),(5,9)

Welcome, @rmassei

Please see our Data Manipulation Olympics tutorials for the types of tools you can use.

One question your need to address first: how to match up lines between the two files? If they are already in the same order, you can use Paste. If there is a common key or set of common coordinates (even if you don’t want to include that in the final file), use a Join tool.

Once everything is in the same file, then you can use Cut to restrict columns, or maybe Sed or Awk to do the cut and merges. The tutorial above includes examples of what each do, and with real data, plus there are many guides online and the usage in Galaxy is the same as when using these tools directly. If you get stuck at a step, you can share your history and we can try to help with the syntax.

Remember that Galaxy will prefer to work with tabular data, just like the original tools do. So, start by converting your data to a tabular format and you’ll be ready to go. If you want the format in csv to download after, you can convert it back once done. Use the pencil icon on a dataset to reach the Edit Attributes → Datatypes form where simple format transformations like this are done.

At the end, consider extracting the tool series into a workflow for reuse.

Hope this helps! :slight_smile:

1 Like

Hi @jennaj,
thanks a lot for all the info and support!
I already thought to use the Awk or Sed but I was hoping there was a more step-by-step approach by just using a sequence of text manipulation tools. In the end, I just used the Awk and it worked out great :).
In the case below, the target columns are 3 and 4 from a larger tabular file:

BEGIN { FS=OFS="\t" }
{
    split($3, col3, ",")
    split($4, col4, ",")
    output = ""
    for(i = 1; i <= length(col3); i++) {
        output = output "(" col3[i] "," col4[i] ")"
        if (i < length(col3)) {
            output = output ","
        }
    }
    print output
}
1 Like