tools that require bigwig input cannot use bedgraph (as bigwig) file

There is a problem with computematrix (and maybe also other tools) when you choose a bedgraph file as an input. Galaxy will let you choose a “bedgraph (as bigwig)” file as an input but will then fail to run and will tell you that your input dataset is in an error state. At first I thought that something was wrong with my bedgraph dataset that came out of MACS2 but I found that if I just convert to bigwig format separately first (with Wig/Bedgraph to bigwig) then computematrix will run. So now that I know this, it isn’t a big problem since conversion from bedgraph is straightforward, but other people choosing bedgraph as bigwig files as inputs may be confused by the ambiguous “error state” message and not know how to resolve this. Previously (as recently as a few months ago), I could use a bedgraph (as bigwig) file as an input and there would not be an error so something seems to have changed since then. It would be better to have tools that require bigwig inputs not show bedgraph files on the available score file list at all.

Hi @TTP

This is interesting behavior. Is it still reproducible? I wondering if the converter tool is missing or has some new bug!

If you want to isolate and share back an example history with the problem, we can confirm and reach out to the administrators to get it fixed.

As an aside, I’ve seen something similar but it wasn’t a problem with the converter program specifically – it was a problem about a missing “database” assignment. The database key is what links a dataset to a fasta index, and that is required for the auto-conversion.

If Galaxy doesn’t host your genome, you can create a custom database key (aka “custom build”) and it works the same way. https://training.galaxyproject.org/training-material/faqs/galaxy/#reference%20genomes

Yes, other people I know have had a similar problem. I just ran another job to illustrate this:

[

Galaxy
usegalaxy.org

favicon.ico

](https://usegalaxy.org/u/tpaull/h/hb155-to-hb174)

The other problem that is visible in this history is that I am getting a lot of “used more memory than it was allocated” errors these days. I am not sure how to get around this. These are ChIP datasets and the problem seems to be the very large size of the ChIP input datasets.

Thanks,
-Tanya

1 Like

Thanks, I’m reviewing.

Odd … this should be happening less frequently given recent boosts in resources. I see an example in your history so will do a check for that as well :slight_smile:

OK, this is what is going on:

  1. The version of the converter available from the tool panel includes an option to trim “overhanging coordinates”, instead of failing. It is toggled on by default.
  2. The version of the converter available from the “auto-convert” between the datatypes does not include a trimming function by default. Why? Because that would introduce a data change that is not explicitly exposed/chosen by the user. Converters are convenience functions … and never directly change data content, on purpose.
  3. MACS can output coordinates that can extend past the chromosome ends. This is a known “feature” of MACS, and everyone would have to adjust for that with certain functions or downstream tools. This is definitely not just in Galaxy. Review this discussion at the MACS forum if you are curious. The tool they recommend is the same one used in Galaxy, and the trim option is available in the version from the tool panel, but not in the auto-convert method’s version (again, on purpose).

What to do:

  1. Extract a workflow from one of your successful histories, and make sure that it includes the direct conversion step that includes the trimming. Or, add it directly to a workflow by editing it.
  2. Use your workflow. It will work just like a single tool if you “favorite” it and hide the intermediate outputs, and will even show up in the tool panel.

Workflows are much easier than people assume, and will save you even more time, over time, especially since you are doing the same thing across a bunch of samples. You could even just organize and tag all of your samples at once, then run the workflow once on the whole thing. Or, keep it simple between the pairs like you are doing now. Either is fine.

Now, your files are large. If the direct conversion fails for memory reasons when using the trim option, and the database assignment is actually correct, then you can trust the error message about runtime memory. Some data will actually exceed public computational resources of course! A scaled up private server is a common solution. See the Galaxy Platform directory for options – there are both free and pay-for-use choices (apply your grant funding, etc).

Hope that helps!