Clearer Documentation for 'Filter List'

As a new user, the Filter List documentation was confusing/unclear. The documentation states that the function will:

take an input list and a text file with identifiers to filter the list with

This led me to believe that partial identifiers could be used, such as “Rep1” matching all data sets in the list with “Rep1” in their names. I found, however, that the filter function will only work if entire file names are used.

If the intended behavior is to only match complete file names and not support partial matches, I think the documentation would be more clear if it used “file names” instead of “identifiers”.

If the intended behavior is to support partial matches, then I think there may be a bug as I was not able to get partial matches using either a plain string or a proper regular expression as identifiers in my text file.

1 Like

Thanks, you are correct. The Filter List tool only works with complete identifiers.

The coded that implements Filter List is in class FilterFromFileTool in the galaxy.tools module and reading that code it shows that a “filter match” occurs when an identifier from the input collection matches an identifier from the input list of identifiers. I.e. complete match, not substring match.

The documentation is in the tool XML file. Do you have a suggestion for better wording?

2 Likes

I would probably change

This tool will take an input list and a text file with identifiers to
filter the list with. It will build two new lists - one “filtered” to
contain only the supplied identifiers and one of the discarded elements.

To something like

This tool will take an input list and a text file with the names of list members to split the list by. It will build two new lists - one “filtered” to contain only the supplied names and one of the discarded elements.

Another potential way to reduce confusion might be to call the method Split List instead of filter list, since a filter usually implies some sort of pattern matching while splitting does not.

Another option might be to tweak the interface for the method. You could show all the file/element names in the list in a pane and allow the user to select the ones they want to keep (just like a batch file selection field) instead of using a text file. This would make the method easier to use and eliminate potential sources of confusion.

1 Like

I understand your confusion and have opened a PR (https://github.com/galaxyproject/galaxy/pull/7127) to update the help text. I think changing the text is better than changing the tool name.

I think it is best to keep to using a text file so that this can be part of a workflow without requiring user intervention.

2 Likes

The new instructions look good, thanks for updating things!