I’m working on a project to create a library(in Java) that can validate various biological file formats like GFF, FASTA, OBO etc.
But as I’m not from this field, So I’m little confused about what kind of validation should be performed by the validator program.
There are some online tools like Genome Tools that can validate GFF file format, So can anyone help me understand what kind of validation rules should be applied on easy of these files.
Most Galaxy tools now use conda for dependency resolution. So, check with the IUC about why this was dropped or to find out what alternative they might suggest. IUC Gitter chat: https://gitter.im/galaxy-iuc/iuc
Galaxy does check the format of many datatypes as part of autodetect/assignment of datatype (Upload, Edit Attributes). But those are not comprehensive validators that report back all the potential problems into a report (yet). But, you might be able to repurpose parts of that functionality into wrapped, standalone tools. https://github.com/galaxyproject/galaxy
There are also several wrapped Galaxy tools that check dataset formats (and produce a report). Picard > ValidateSamFile assess validity of SAM/BAM dataset is one example. Searching the ToolShed would be the best way to find all.
In the end, a format validator would need to both test for format compatibility versus public file specifications and meet whatever custom format (sometimes stricter, sometimes not) the target tools are expecting.
How to “validate format” effectively for all the formats and issues that can come up, so that they are accepted/interpreted correctly across computational tools, can vary and is part of the reason this forum for Galaxy exists. Even with all the format validation Galaxy includes problems still come up due to incompatible format variations (sometimes intentional by whoever hosts the data, sometimes due to some user-introduced error).
Frankly, this is one of the most complicated components of doing work in this field. If you do create a set of new format validators, those would be welcomed as new wrapped tools in the ToolShed.