best practice for tabular data, headers or not?

tool-dev

#1

I’m a bit new to galaxy, I would like to make our software available to the galaxy ecosystem. The first tool that i created analyzes a data set, and produces tabular data. I’ve ended up choosing the tabular format over ‘tsv’ or ‘csv’ as the visualization tools did not seem to work with those.

As the number of columns, their order and meaning, is not knowable before running the tool, the first row contains column headers for the remaining table. That made sense to me, however they are not recognized as such by galaxy. When i view the data, new headers with the column numbers are generated. And the visualization tools produce artifacts for the first row. (i.e scatter plots start with ‘null’).

So the question: would it be better not to return header information? Currently, i’d have to remove the first line with the text manipulation tools, before plotting. Since googling does not provide much information about this topic, what are other peoples experiences.

Thanks!


#2

Does the header start with a # ? That will make Galaxy recognize it as a header in tabular format.

Here is a very simple history that shows the three format variations (content wise) that are summed up as just two (based on the interpreted format of that content):

https://usegalaxy.org:/u/jen/h/tab-headers

Whether or not to include headers is up to you. But I would vote for including them if you can. Maybe make this an option on the tool form: Include header? Y/N. The help sections of the form should also include the column labels and descriptions, but that would be not very fun to cut/paste into a file is someone wanted the data labeled for some reason (graphing?).

The plotting tool you are using could maybe also use an upgrade, you could contact the author. See if they would add something like: Does the input contain a header? Y/N (commonly used).

Hope that helps!