vtk file type and file extension

Hi All,

In our project, we generate VTK poly data output and VTK recta linear grid output.
The official file extension for this file type (e.g. understood by vtk’s paraview software) is vtp and vtr, respective.
Both of these files are stored in the vtk xml format.

Looking at the official Galaxys filetypes, this is what I find.

<datatype extension="vtkascii" type="galaxy.datatypes.constructive_solid_geometry:VtkAscii" display_in_upload="true"/>
<datatype extension="vtkbinary" type="galaxy.datatypes.constructive_solid_geometry:VtkBinary" display_in_upload="true"/>
<datatype extension="vtpascii" type="galaxy.datatypes.constructive_solid_geometry:VtpAscii" display_in_upload="true"/>
<datatype extension="vtpbinary" type="galaxy.datatypes.constructive_solid_geometry:VtpBinary" display_in_upload="true"/>
<datatype extension="vtkxml" type="galaxy.datatypes.constructive_solid_geometry:VtkXml" display_in_upload="true"/>

Now, I am not really sure what to use.
In your logic, it would be: vtpxml and vtrxml, which you don’t offer.
When I download the resulting files, none of your file extensions are understood by the official vtk software (a.k. paraview).

I would like to have a file extension that is understood by galaxy and by the official software.
What is the proper way to overcome this problem?

Thank you, Elmar

unlisted as spam? : (

Sorry, for the spam flag. That’s an automated process and not always correct.

I have absolutely zero knowledge about vtk files, but what is wrong with choosing vtkxml as the datatype?

1 Like

Hi @wm75,

Thank you for letting me know that this was an automated process.

Vtk is a software framework for 3D graphics.

The basic file format has changed over the year from a now legacy text format to a xml based file format (similar like in Microsoft doc → docx or xls → xlsx). So it is correct, these files are vtkxml file.

And now comes the big but. There are different type of graphics:
Mesh like polydata that we use to draw cells (the correct extension is vtp), and rectilinear grid data that we use to draw substrate gradients (the correct extension is vtr) and many more.
Your file format is not distinguishing between these graphics types.
If you follow your logic, this should agt least be vtpxml and vtrxml.
But this as well bad because then the official software does not recognize what these files are and how to handle them. It’s like you would change the Microsoft extensions: docx → docxml or xlsx → xlsxml. I don’t think Word or Excel could handle that. And even if, paraview (which is the official vtk software) cannot handle your file extensions (e.g. when you download the file from galaxy to anlysie it further locally in paraview) and most probably much other software that possibly could handle the file format. It is because of your extension change, that the software does not know what type of file this really is.

That is the problem.

Hi @bb-8

I’m curious about the use case since I can think of a few different solutions for this.

What software in Galaxy is creating these files? I couldn’t find any tools in the ToolShed.

If no tools in Galaxy are interpreting the sub-type, and the file data is only being used as an XML file (to do text manipulations?), then I guess you could keep track of the specific XML sub-type in the dataset name (or a tag) and Galaxy wouldn’t need to understand or know about it at that sub level of detail.

  • Upload file name: myfile.vtp
  • Galaxy dataset name: myfile.vtp and format datatype: vtpxml (or even just text, or xml)
  • Downloaded file name: myfile.vtp.vtpxml or myfile.vtp.xml or myfile.vtp.txt

A downloaded dataset could preserve the original uploaded file name in output if you use a workflow – or capture the original name in a tag and sort/rearrange batches of datasets based on tags, do manipulations specific to that type, download it all. You might need to remove the extra extension that Galaxy adds later on but a simple shell scrip could mostly automate that part of it too.

But if you plan on creating tools to either produce or use the sub-type, you could propose new datatypes. Uploaded files would be compared to those definitions and the format datatype assigned. Let us know if this is what you would like to do and I can point you to some resources.

The other solution would be on the Paraview side. I’m not sure how it works exactly or if it has defined datatypes in an analogous way that Galaxy does. (Galaxy runs a “guess” on the file content to assign a format datatype type). If Paraview did something similar, and could distinguish between the sub-types of the format, the “datatype” could be assigned at that other import step and the more general intermediate type that Galaxy assigned wouldn’t matter (and wouldn’t matter at all unless the data is being transferred via a handshake URL).

Galaxy doesn’t use the original file extension in the name of the file for anything important (with a few exceptions). Instead, the actual content of the file is verified against potential types and a guess is made, then the user can adjust it if some mismatch was made. The file name itself is totally up to you to format any way you want. Galaxy only cares about the datatype to assign a general class (“xml” in this case) then just that is used or a more specific sub-type might be used to do tool specific functions making use of that sub-type’s characteristics. Since the latter isn’t done (unless I missed it!), you have a lot of flexibility!

In short, the file name is abstracted away from the file content. Then the datatype is based on the file contents. Since no tools or functions interpret the sub-types, the data being “xml” is all that matters (that I can find, please correct me!). But if someone was going to write a new tool or function that does use the sub-type, more specific datatypes could be developed. The other Vtk types were likely stricter, and might be used, but I’m not sure where to look without a few more clues about your use case.

Let me know what you think! We can certainly follow up. :slight_smile:

Hi @jennaj,

On Galaxy Europe there is a new interactive tool called physicell studio.
Here:

This is agent-based modeling software that we usually run local or on hpc.

For downstream analysis for physicell output, there exists a pip installable python package called physicell data loader (pcdl). All pcdl does is to transform physicell output into common data formats, most of the used by bioinformaticians for downstream analysis. Namely, h5ad, csv, json, gml (for graphs), txt, jpeg, tiff, png, ome.tiff, vtp and vtr (for 3d graphics). My aim was, to port pcdl (more exactly, the about 20 command line commands pcdl provides) to galaxy. So, it will be my tool that produces those files, and if there are no tools on galaxy to process them, then simply to download and process them locally.
On the big 3 (windows, mac, linux) pcdl is super easy to install and works flawless. So, I thought it will be a piece of cake to port pcdl to Galaxy ~ and frankly, I was totally proven wrong.
Anyhow, I understand now that I have to modify the filename.ext to filename.ext.galaxyext and will do so.
At this moment, I don’t plan to implement tools for galaxy to process the file types that galaxy as of today is unfamiliar with (gml and most probably vtr). I leave this to other folks in the empire : ).

Thank you for your explanation! It is now clear to me what I have to do.
This should get me a step further to finalize the pcdl galaxy port.

Best wishes, bb-8

1 Like

Great! I’m glad we could come up with something for now, and that you have some options for what might be needed later on if this is expanded (new datatypes! from you or others). Thanks for following up and explaining. What you are doing is exactly how the project was built up: review what is there, identify what is needed, develop new integrations. :slight_smile: