Specify column to group by -- Join two datasets -- Fixed

I am following along with a tutorial given on the coursera genomic data science course. It involved first joining two datasets on a column column to produce another data set. Then it involves grouping the resultant dataset on a column. In the video it shows that there is a drop down list of column names from the dataset being grouped to pick from. No such drop down appears. When I tried typing the column name in from the dataset it seems to have ignored it when executing the task with an error indicating that no column was specified. Any suggestions?

1 Like

Welcome, @claudiofr!

Are you working at Galaxy Main https://usegalaxy.org?

If yes, this tool is currently problematic (and we expect to have it fixed very soon): Join two Datasets side by side on a specified field (Galaxy Version 2.1.3)

Meanwhile, an alternative tool to use is: Join two files (Galaxy Version 1.1.1).

Both tools have been upgraded a bit since this particular class was published. There will not be a drop-down list of columns. Instead, enter the column number into the form. Example: For Column 4, enter just: 4

Other text manipulation tools may require that columns are specified with different nomenclature and the form will note in the help section when it differs. Example for some other tools: For Column 4, enter: c4

If you are not sure, feel free to ask.

For questions or help with troubleshooting, it also helps if you specify the following in posts:

  1. Note where you are working. Examples: Your own Galaxy (what release and where it was sourced) or if a public Galaxy, include the server’s URL in the post.
  2. Copy the full tool name and version from the top of the tool form and paste that into your question.
  3. Capture the exact error message and also paste that into your question. Sometimes just expanding the dataset and copying the comments is enough. Othertimes, try clicking on the bug icon for the red dataset and capture those remarks as well. And sometimes reviewing those remarks will help you to solve the problem on your own.

You don’t need to actually submit the bug report unless you think it is a true server-side issue. Getting usage help or finding out about known issues here is usually much faster. Should you decide to do both – including the Galaxy Help post link directly in the bug report comments is very helpful, then come back and update your post to state that you submitted the bug report.

The more information you can provide, linked together, the better and quicker we can help. Your question now happens to be a known issue, if I am understanding correctly, but please clarify as needed.

Thanks!

1 Like

Hi,
I am also having similar problems - however neither join data sets, nor join files work for me - any further suggestions please?

1 Like

I am working at Galaxy Main https://usegalaxy.org

The tool I am using is:

Group data by a column and perform aggregate operation on
other columns. (Galaxy Version 2.1.4)

Specifying the column to group by as a number only, i.e. 4 resolved my problem.

The help text associated with the tool did not indicate the format to use for specifying columns.

Thank you

1 Like

@claudiofr

Thanks for clarifying. Yes, the Group tool is also impacted. We are working on a correction. Update: Group was NOT impacted or related to the other known dependency issues. Problems with Group are likely usage related. See more in follow-up posts below.

Ticket: https://github.com/galaxyproject/usegalaxy-playbook/issues/288 https://github.com/galaxyproject/usegalaxy-playbook/issues/289

Thanks!

@KeeleyB

The alternative Join tool is working at Galaxy Main https://usegalaxy.org.

  • Join two files (Galaxy Version 1.1.1)

It is possible that the tool is not working at some other public Galaxy server, or that there is a usage problem.

The Group tool does not have an alternative. Update: And is working as expected.

Text Manipulation functions impacted by dependency issues are a priority correction. The ticket will close out once fixed.

Thanks to all for your patience! Galaxy is undergoing some significant changes at this time but we expect the server-side issues with tool dependencies to be resolved very soon.

Thanks - I used Join and have got everything I needed to done - thanks very much!

1 Like

@claudiofr

Ah, I reread your reply more carefully. And Group was determined this morning to NOT be involved with the other tools waiting for the dependency correction.

In general, free-text tool form entry is moving away from the “c4” nomenclature (is a Python derived way of naming data columns) to the more end-user friendly “4” nomenclature (creates a better UX experience for those not familiar with coding).

When it doubt, use the simple nomenclature of just the actual column number. Example: “4” (as you have done).

Tools that still require a special nomenclature do state that on the tool form for all cases I have examined. This is an example tool: Compute an expression on every row (Galaxy Version 1.2.0). The tool form explains expected column nomenclature (“c4”). Since the possible expressions that can be entered can become quite complicated with this tool, and that expression is processed directly as entered, this is how the tool “works” (at least for now!).

Standardizing the nomenclature across all tools would be ideal, of course :sunglasses:

More about Galaxy development

Galaxy is a popular, evolving, open-source project. Translating feedback from end-users into UX improvements is one of our core missions: Making Galaxy accessible (easy to use) for all.

We rely on community participation and contributions, both feedback and development. Perhaps someone from the community interested in a new development project with a clearly defined scope will take this on.

Should anyone reading this post, now or later, be interested in tackling this, or other UX improvements, or really anything else (!), this is how to get involved: https://galaxyproject.org/develop/

Thanks!

Update

Resolved, along with similar tools that had the same dependency issue.

Thanks! I highly obliged to request you to add this link in Genomic Data Science specialization too.