xlsx upload failed in usegalaxy.org

In my firefox 66.0.5 (64 bit), I found usegalaxy.org can’t recognize the xlsx file:

can't recognize the xlsx file

But when I upload via Chrome, all goes well.

1 Like

Could anyone help me to fix the issue? Thank you very much.

This is probably a corner case bug. I’ll report the issue. But you don’t need to wait for any corrections.

For almost all use cases, it is best to export data from Excel in tabular format first. Then upload the data in tabular format to Galaxy (using “autodetect”). Very few tools work with data that has the “xlxs” datatype assigned, and none that are installed at https://usegalaxy.org (that I am aware of).

Thank you very much. We have some new tools on our local Galaxy, which work with “xlsx” input. So we really need “xlsx” input, and hoping it works fine in Firefox as well as Chrome .

1 Like

When running your own Galaxy server, sometimes a new datatype need to be defined and added to your server.

The browser anyone is using to access your Galaxy web interface is not a factor. This is a server-side configuration.

The xlsx datatype has not been custom created/defined in the ToolShed https://usegalaxy.org/toolshed >> Custom datatypes. It is included in Galaxy but it can be problematic to use for a variety of reasons.

There is a converter that you might be interested in: Excel to Tabular. I haven’t used it – and may also be problematic because of how much Excel data content can vary, but still might be worth reviewing: https://toolshed.g2.bx.psu.edu/view/ylebras/xls2tabular/48995f55bb5b

In short, the column order/content of excel data can vary greatly. Hidden characters are a particular problem and difficult to detect until you specifically look for them (“soft returns” inside of cells are one example).

You should create your own datatype, perhaps made more specific than a general Excel format, although this depends on how your tools expect inputs to be formatted. The more specific the datatype, the more robust your tools/workflows will be, and the fewer errors you’ll have reported by your end-users due to simple input format problems.

If you plan on publishing your tools to the ToolShed, definitely be as specific as possible about datatypes. Defining and using datatypes (at Upload and when selected as inputs on tools forms) are very powerful ways to make your tools easier to use.

To learn about how to create/define/add-in new or existing datatypes to your server, search with the keyword “datatype” in the Admin Docs: https://docs.galaxyproject.org/

Thanks!

Dear jennaj, Thank you for your help. I can see xlsx in lib/galaxy/datatypes/binary.py :

class Xlsx(Binary):
    """Class for Excel 2007 (xlsx) files"""
    file_ext = "xlsx"
    compressed = True

    def sniff(self, filename):
        # Xlsx is compressed in zip format and must not be uncompressed in Galaxy.
        try:
            if zipfile.is_zipfile(filename):
                tempzip = zipfile.ZipFile(filename)
                if "[Content_Types].xml" in tempzip.namelist() and tempzip.read("[Content_Types].xml").find(b'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml') != -1:
                    return True
            return False
        except Exception:
            return False

And the “registration” and “sniffed” also seem to ok in config/datatypes_conf.xml :

<?xml version="1.0"?>
<datatypes>
  <registration converters_path="lib/galaxy/datatypes/converters" display_path="display_applications">
    <datatype extension="xlsx" type="galaxy.datatypes.binary:Xlsx" display_in_upload="true"/>
    .....
  </registration>
  <sniffers>
    <sniffer type="galaxy.datatypes.binary:Xlsx"/>
</sniffers>
</datatypes>

I restart my local galaxy, upload a xlsx file, got the same warning on Firefox:

Warning: The file 'Type' was set to 'xlsx' but the file does not appear to be of that type

I also test my input file, it seems not a zip file?

Python 2.7.15 (default, May 26 2018, 12:12:51)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import zipfile
>>> tempzip = zipfile.ZipFile("/galaxy-dist/database/files/000/dataset_6.dat")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/software/python-2.7/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/usr/local/software/python-2.7/lib/python2.7/zipfile.py", line 811, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

All the files work fine on Chrome browser, I really don’t know what happen with Firefox when the xlsx file upload, and hpoe someone can help me to solve it.

Thanks

1 Like

Thanks for explaining and documenting your testing so well – very much helps to understand what is going on.

I don’t think this is a Galaxy problem. Instead, the flavour of zip compression you happen to be using is somehow incompatible with Firefox. Now, the OS/OS version could be a factor, or whatever version of zip/Firefox you are using, plus browser extensions. Also, be aware that Galaxy only handles single-file zip archives and will load just the first file if the archive contains more than one file – but that likely is not your issue considering the upload success with other browsers/same data.

BTW, Firefox is “locked down” a bit more than other browsers with respect to privacy settings and the like. More so than both Safari and (especially) Chrome and that entire Google ecosystem :female_detective:. But, that is a can of worms and not for discussion here … you’ll find much active debate about why/security concerns online if interested or maybe are already aware of it and that is why you would prefer to get this to work with Firefox already.

I’m going to ask you to do another test – I don’t think this in your posts yet, but correct me if I missed it please and quote or point me to the post/section in your reply.

What happens if you load an xlxs file that is already unzipped (uncompressed) with Firefox?

All my test is on Firefox Quantum (69.0 win64) and Galaxy Release (v 18.09). The zip file with Firefox works fine, but the unzipped (uncompressed) xlsx file failed. I think it is the Firefox problem, but I don’t konw the detail of bug.

I wrote a File Upload CGI Script to test the xlsx file upload on Firefox Quantum 69.0. I have uploaded an unzipped (uncompressed) file successfully to remote server from windows, but it is not a zip file.

>>> import zipfile
>>> tempzip = zipfile.ZipFile("upload.xlsx")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/software/python-2.7/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/usr/local/software/python-2.7/lib/python2.7/zipfile.py", line 811, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

It seems Firefox will make something change when an xlsx files uploading, but unfortunately we don’t know what it has changed.