Hi @bb-8
I’m curious about the use case since I can think of a few different solutions for this.
What software in Galaxy is creating these files? I couldn’t find any tools in the ToolShed.
If no tools in Galaxy are interpreting the sub-type, and the file data is only being used as an XML file (to do text manipulations?), then I guess you could keep track of the specific XML sub-type in the dataset name (or a tag) and Galaxy wouldn’t need to understand or know about it at that sub level of detail.
- Upload file name:
myfile.vtp
- Galaxy dataset name:
myfile.vtp
and format datatype: vtpxml
(or even just text
, or xml
)
- Downloaded file name:
myfile.vtp.vtpxml
or myfile.vtp.xml
or myfile.vtp.txt
A downloaded dataset could preserve the original uploaded file name in output if you use a workflow – or capture the original name in a tag and sort/rearrange batches of datasets based on tags, do manipulations specific to that type, download it all. You might need to remove the extra extension that Galaxy adds later on but a simple shell scrip could mostly automate that part of it too.
But if you plan on creating tools to either produce or use the sub-type, you could propose new datatypes. Uploaded files would be compared to those definitions and the format datatype assigned. Let us know if this is what you would like to do and I can point you to some resources.
The other solution would be on the Paraview side. I’m not sure how it works exactly or if it has defined datatypes in an analogous way that Galaxy does. (Galaxy runs a “guess” on the file content to assign a format datatype type). If Paraview did something similar, and could distinguish between the sub-types of the format, the “datatype” could be assigned at that other import step and the more general intermediate type that Galaxy assigned wouldn’t matter (and wouldn’t matter at all unless the data is being transferred via a handshake URL).
Galaxy doesn’t use the original file extension in the name of the file for anything important (with a few exceptions). Instead, the actual content of the file is verified against potential types and a guess is made, then the user can adjust it if some mismatch was made. The file name itself is totally up to you to format any way you want. Galaxy only cares about the datatype to assign a general class (“xml” in this case) then just that is used or a more specific sub-type might be used to do tool specific functions making use of that sub-type’s characteristics. Since the latter isn’t done (unless I missed it!), you have a lot of flexibility!
In short, the file name is abstracted away from the file content. Then the datatype is based on the file contents. Since no tools or functions interpret the sub-types, the data being “xml” is all that matters (that I can find, please correct me!). But if someone was going to write a new tool or function that does use the sub-type, more specific datatypes could be developed. The other Vtk types were likely stricter, and might be used, but I’m not sure where to look without a few more clues about your use case.
Let me know what you think! We can certainly follow up. 