Fatal Error with "Pairwise intersection and heatmap for genomic intervals" tool.

Dear colleagues,

Would you please provide your advice on getting the “Pairwise intersection and heatmap for genomic intervals (Galaxy Version 0.6.5+galaxy2)” tool run correctly?

I have 3 bed files each with “chr”, “start”, “end” columns, and basically willing to get the combined Fisher’s statistics.
the required “reference genome ID” in my case is: hg19

I am leaving all the remaining parameters as default.
Tool gives me the below message;

Traceback (most recent call last):
File “/usr/local/bin/intervene”, line 604, in
main()
File “/usr/local/bin/intervene”, line 424, in main
pairwise.pairwise_intersection(label_names, options)
File “/usr/local/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py”, line 486, in pairwise_intersection
barplot(series, matrix, outfile, options, max_size=max(bed_sizes))
File “/usr/local/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py”, line 136, in barplot
cax, order = heatmap_triangle(matrix, ax, options)
File “/usr/local/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py”, line 213, in heatmap_triangle
Z = sch.linkage(D, method=‘average’)
File “/usr/local/lib/python3.7/site-packages/scipy/cluster/hierarchy.py”, line 1038, in linkage
y = _convert_to_double(np.asarray(y, order=‘c’))
File “/usr/local/lib/python3.7/site-packages/scipy/cluster/hierarchy.py”, line 1560, in _convert_to_double
X = X.astype(np.double)
ValueError: could not convert string to float: ‘on’

Tool Exit Code 1
Job Messages

desc: Fatal error: Exit code 1 ()
error_level: 3
exit_code: 1
type: exit_code

I’ve sent a bug report the tool admins on 22 April, without any answers yet.

Thanks in advance for your advice.
Regards,
Serdar

Hi @qcsciphi

I found your bug report.

ValueError: could not convert string to float: ‘on’

One of the BED files has decimal values for the 5th column, when “score” data usually needs to be a whole number. That is likely triggering this error.

That same file is also missing the “strand” 6th column. You could pad that or supply actual values. Not sure if this matters. You could try the first then layer this in if needed.

This is the file specification: Genome Browser FAQ (BED)

Tutorial Hands-on: Data Manipulation Olympics / Introduction to Galaxy Analyses

Not all bug reports get a reply – those are mostly for reporting actual server bugs. Next time, post a question here then include the topic link in the comments of the bug report. Just let us know the same as you did for this one so we can link up the two (I could find yours this time, but that won’t always be true). Note that this will only work for certain servers and limits who can help you – posting a history share link let’s everyone help, and works faster, and you can always unshare after.

Hope this helps but let me know if something was missed or I misunderstood or reviewed the wrong bug report! :slight_smile: I would be curious about how this works out, too.

Dear Jennifer,
Thank you for your reply with instructions. You are on the correct bug report. Thanks. I went through the BED file specs and the tutorial.

Subsequently corrected the irregularities in col5 and col6 of the concerned BED file (dataset #272), and I still get the same error (datasets #273-276):
ValueError: could not convert string to float: ‘on’.

Would you please have a look and try to suggest a solution?
Please let me know if I need to share the history with you.
Thanks again!

@qcsciphi

The files look Ok now, so moving on to scientific troubleshooting…

  • Do the coordinates “overhang” the ends of the chromosomes? I see that you added extensions in during prior steps.

    → This is my next best guess about why that error is coming up: trying to subtract a coordinate in the input from the length of a chromosome could result in a negative number that the tool cannot handle. The solution here is to “trim” the BED coordinates. You can get the hg19 chromosome lengths from the UCSC table browser for the manipulations.

  • And, I don’t think this tool will compute a matrix between samples in one run. Meaning, you should input pairs individually for three samples. I ran a quick test to see if inputting pairs was enough, and it wasn’t. I didn’t check all pairs to see if one particular file had the coordinate problem or not, but that is something you could explore to find which file has the overhang issue (if just one…).

Give that a try and let us know if it works or not

Hello,
Thank you for kindly providing your time and for your suggestions.

In a separate history (link below) I checked 2 of my BED files against protruding chromosome end coordinates.

I filled BED fields(columns) as needed, and ran the tool with only 2 BED files. We still get the same Error message.
Would you please have a look and provide your opinion?
Thanks again.

Hi @qcsciphi

Thanks for posting the history. Very helpful.

And, I figured out what is going wrong. The tool is interpreting the input file names (the dataset name) to print out the “sample” name on the result graphic. And, if that includes any whitespace, the tool gets stuck and fails.

I’ve ticketed the enhancement to handle that better. This will happen if possible. That isn’t always true, since it depends on how the underlying tool was written. But let’s see what the developers think. Enhancement: Adjust how intervene_pairwise interprets input file names to print samples names in graphics · Issue #5997 · galaxyproject/tools-iuc · GitHub

The workaround is in the ticket but also here → Replace or remove any whitespace in the input dataset names, and remember to include in the name only what you want to later show up in the graphic. More than two files seems to be Ok too.

The “on” in the error message was the clue – it is the first word after the first whitespace in the dataset names. Whew! Bioinformatics is fun :mechanic:

Thanks for all the followup!

YES! Everything works now when there’s no spaces in dataset names.
What a history… it was an unfortunate one for me. Luckily you are here!

I am amazed with your patience, persistence and discipline to do that kind of troubleshooting work as a job.
Your work helps a lot and keeps the Galaxy turning.

Thank you very much for your help in clarifying and solving the issue.
Best regards,
Serdar

1 Like