Compute distance matrix

LucasJennings · May 16, 2024, 3:11pm

When computing a distance matrix for gene orders it appears that the results in the matrix are inversed in that a smaller number is more similar and a larger number is more dissimilar and the diagonal results in zeros. These results are repeated with all the distance measures. The past version of CREx was opposite where the higher the value the more similar the genomes and the diagonal was 1326. I checked this using old data where I used the old version of CREx and am still getting the same issue so I do not believe it is my new dataset. I bring this up due to the interpretation of the test for when I go to publish any work.

Thanks for any help

jennaj · May 16, 2024, 6:27pm

Hi @LucasJennings

The tool author visits this forum, let’s see how they state to interpret the data outputs. ping @bernt-matthias

Meanwhile, you could share some examples. Which prior tool version? Do you have a simple comparison example that shows the difference? How to share → How to get faster help with your question

Recent Q&A about usage, in case that might be related → CREx: "missing header" error

LucasJennings · May 17, 2024, 1:13pm

Yes I can share all of that. I am posting a document showing the output I got yesterday using the compute distance matrix tool on Galaxy and a matrix that was made using the old CREx webserver. The new matrix is at the top and the old one is at the bottom of this document.

jennaj · May 17, 2024, 7:00pm

Hi @LucasJennings

Are you able to compare the two runs using the command-line for each, to see if that shows if different options/methods were used?

The Galaxy command-line is under the “i” icon – scroll down on that page to find it.

Maybe you can also find this in the other web application?

Other than that, maybe different versions of the tool itself is the root change. That said, your differences seems more like a parameter change.

We can ping one of the tool authors again who would know much more of course! Hi @bernt-matthias would you be able to suggest what to try next? Thanks!

bernt-matthias · May 18, 2024, 9:23am

Dear @LucasJennings,

thanks for bringing this up. You are completely right. The old website had three measures that the user could choose from:

number of common intervals (a similarity measure)
number of breakpoints (a distance measure)
reversal distance (a distance measure)

For the coloring of the matrix we transformed the distance measures into a similarity measure (by n - dist, where n is the number of genes) – but this point is not relevant for your case, since you used the number of common intervals.

When I had to resurrect the functionality of the website I resorted to the program distmat that had already the same functionality … with the important difference that a common interval distance is computed, i.e. (n * (n-1) + 1) - X for linear genomes and ((n-1)*n) + 1) - X for circular genomes, where n is the number of genes and X the number of common intervals. Here the first part of the difference is the maximum number of common intervals (which is achieved when comparing equal genomes, i.e. the value on the diagonal in your original tables).

The advantage of this change was in my opinion to have consistently distance measures in the output.

The second reason that made me do this change is that I added the CREx distance, i.e. the number of rearrangements computed by CREx. It was a huge oversight that the old CREx website used all sorts of distance measures to compute that matrix (that was only intended to select the actual pairwise CREx comparisons to be shown) but not the CREx distance. As I was thinking about the change (to compute the common interval distance instead of the number of common intervals) I now faced the problem that the CREx distance does not have a maximum that I can easily compute (probably it’s n, but I would need a formal proof) – so I could not make this a similarity measure.

So long story short: in the end I just made everything distance measures. And I certainly should have documented this better.

What I could implement is a boolean flag that would turn the breakpoint, inversion and reversal distances into similarity measures (and result in an error for CREx distances). What do you think? Would this be helpful?

Cheers,
Matthias

LucasJennings · May 20, 2024, 1:08pm

@bernt-matthias I think it would be good to have continuality from what was used in previous literature. Most of the literature I see follows the common intervals where a higher number is more similar as they share more common intervals. Would this be possible to implement? If not, I will just use the similarity computed by the common intervals on Galaxy.

bernt-matthias · May 21, 2024, 7:42am

I opened an issue and hope to find time soon to fix it: distmat distance / similarity measure (#4) · Issues · Matthias Bernt / revoluzer · GitLab

LucasJennings · May 21, 2024, 2:13pm

@bernt-matthias thank you very much!

Linno · June 30, 2024, 6:14am

Most of the literature I see follows the common intervals where a higher number is more similar as they share more common intervals. In the tables generated by the new website, is it still the case that higher values indicate greater similarity?Or rather, how should the results from the new tables, where the diagonal is zero, be interpreted?

jennaj · July 5, 2024, 6:25pm

New question moved to CREx: distance matrix.How should the results from the new tables, where the diagonal is zero, be interpreted?

Topic		Replies	Views
CREx Matrix has Different Values usegalaxy.eu support tool-help , revoluzer_crex	5	33	March 5, 2025
CREx: distance matrix.How should the results from the new tables, where the diagonal is zero, be interpreted? usegalaxy.eu support tool-help , revoluzer_crex	2	67	July 5, 2024
Not sure which tool to use for genomic comparison- any suggestions?	1	425	July 1, 2021
CREx: "missing header" error usegalaxy.eu support tool-help , revoluzer_crex	4	222	May 18, 2024
First time user - Genome comparison usegalaxy.org support gtn-tutorial , dropbox	2	286	October 11, 2023

Compute distance matrix

Related topics