Hi!
As an input, I have a single tabular file with several column as given in this example:
col1 col2 col3
1 red dog
2 red cat
3 blue cat
4 blue dog
5 green dog
I would like to create a collection from this single tab file. The collection consist into single datasets created from the input and split according to common column value.
As an example for column 2, the collection will look like:
Have you tried using this tool? Split file to dataset collection (link at ORG).
You could pre-filter the input, or do more rearrangement with the other collection tools as needed, but I think this operation should be possible. You can put all of that into your workflow for reuse.
We have some great guides about using data manipulation utilities. Most are the same as used command line, so searching the tool panel will find them, if that is something you are familiar with. If this is new to you, the tutorial guides can help you to get started.
The Split tool is included in several tutorials, too. Find those linked at the bottom of the tool form, also here → Split file: to dataset collection
You have a nice puzzle to solve but I think this is definitely possible! If you get stuck, you can share back the file, your workflow, and the history with the current outputs, and we can try to give some more specific tips.
Hi @jennaj,
thanks for your nice and comprehensive answer!
While searching for the split file to dataset collection, it actually appeared also the Split to Group tool which is exaclty doing what I need
Thank you for all the training matherial, I will troughly check it!