I have a collection with a large number of datasets and I need to combine into a single file .
The datasets in the collection have 2 columns (c1=count data; c2=key) and are labeled with sample names.
The goal is to combine them to have 1 file with columns = sample names (names of individual data in og set) and rows = count info for each key.
Most, but not all, of the keys in c2 of original file are shared between datasets.
When I run the tool “column join” from the collection operations I have some issues with the resulting dataset and I’m not sure what’s going on.
The issues are:
- Not all of the samples from the first dataset are included in the join dataset - some are missing (>100).
- There are columns in the join dataset that do not have a sample name, but have count data. (note: the number of unlabeled columns does not equal the number of samples from the collection that are missing).
I’m not really sure what’s going on and any insights would be greatly appreciated!!