Hi @roselucia
Odd. But to just manually split up the data values, try this (uses tools from the “GENERAL TEXT MANIPULATION” tool groups):
- Remove the header with the tool Select using “not matching” and the regular expression
^@. That could be reversed to be “matching” if you want to keep the header instead for some other use in another file. You probably don’t want that in your Excel spreadsheet, or at least not in the same sheet. But if you want it, then add it back after the rest of the data is manipulated: Tool: Concatenate.
- Change the datatype on the “not matching” result to be
tabular by clicking on the pencil icon (upper right corner of the dataset) to reach the Edit Attribute forms. Tab into “Datatypes” and assign the datatype tabular.
- Break up data values using the tool Convert delimiters to TAB. Run once with semicolons
; and (optionally) run once with pipes |.
Converting pipes to tabs is going to produce some lines that don’t exactly match others. Some of those values are NULL in your data lines (expected), plus some data lines have repeating data (also expected), so that is optional.
You might want to experiment with other tools to deal with the pipes/empty values. Examples:
-
Replace Text in entire line – could convert
|| values to be something like |NULL|, then run the Convert delimiters to TAB
-
Text transformation with sed – you can play around with queries, but something like this will fill in empty values with “NULL”, but you can pick whatever you want as a placeholder value: dot
. etc, anything that is one “word” and won’t conflict with values are you converting to tabs: s/||/|NULL|/g
-
sed is very powerful and can do much more. There are guides/posts all over the web if you are not familiar with the tool (is a unix utility).
Now, I don’t think this tool will get you the formatting you want (will create repeating lines), but still could try it and review:
-
Unfold columns from a table – split on pipes
The vcf-to-tab tool is from a 3rd party – we just wrapped it to work through the web in Galaxy. The underlying tool’s functionality is determined by the original authors. A link to them is on the tool form. You might want to show them your data and ask if they would consider upgrading the tool to work with it. I don’t think the tool is buggy – rather, it simply has trouble parsing one of your versions of vcf data (format/content).
Why you are getting a different result at .org versus your docker – not sure, could be many factors: tool dependencies, the Galaxy software version, or some semi-random “decision” behavior that the tool performs when given non-standard inputs. The latter is the most likely reason and probably involves a cascade of “rules” the program goes through. That process may not be necessarily reproducible across different compute environments.
Inconsistencies like that somewhat common (not just this tool) are one of the reasons why there is such a big push that reproducible analysis needs to be described in publications very precisely. Not only the data/tools/versions/paramaters used, but also the entire exact compute environment used. Part of why Docker containers are so useful is because entire environments can be saved/published/used by others.
Hope this works out!