MOB-Typer unstable results

Hi all, I’m experiencing unstable MOB-Typer results. Identical sequences are giving different results (conjugative vs non-mobilizable vs no results at all on the same accession numbers). How to know now what to trust?

Welcome, @cornek

Do you mean that the same exact inputs and parameters, when run though the tool at different times, produces different results? Or is something changing between the runs? Different tool version? Different parameters?

Would you like to share a history that demonstrates your observations for more feedback? How to do this is in the banner of this forum, also here → How to get faster help with your question

Xref → MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies - PMC

Hi Jenna, thanks for replying to my problem!

You can access my history here: Galaxy. Unfortunately I have already removed several files involved but others are still there.

To be more specific about what I did: first uploaded 32 fasta files as a collection (nr 3) and ran the tool. Results are still available (nr 36). Some of the results are empty files. Some yielded conjugative plasmids.

Then I uploaded the fasta files individually and ran the tool again. Now some showed non-mobilizable plasmids, others were empty (not identical to first run). I repeated some with different results (e.g. 120 vs 135 and 119 vs 136).

Today I ran the samples again and now got results that were more in line with what I expected (138 till 158).

All runs were in the last 2 weeks using the same version of the tool with default parameters. File 165 is an excel file with an overview of the results that have been obtained.

Hope you can find the problem!

Kind regards, Corné

Small correction: todays rerun was from the collection (but every file started individually instead as a collection).

1 Like

Hi @cornek

Thanks for sharing the examples, very helpful!

This tool executes “per fasta file” whether in a collection or not. You’ll see a message like this when the collection is selected.

This is a batch mode input field. Individual jobs will be triggered for each dataset.

Now, some tools will have a toggle for “pooled” inputs and the message will change but that is not possible with the tool we are reviewing.

For batch use, you could instead input multiple by merging them into the same fasta file and using the first toggle at the top of the advanced settings.

Treat each input sequence as an independant plasmid?

Back to the difference in results for your example runs: These appear to be using slightly different inputs between the runs. Example of one of them. The other is similar.

  • Output 120 used input 20, named: LC586263.1.fasta
  • Output 135 used input 89, named: LC586263.fa

But the fasta files are the same, just loaded at different times. And both produce the same results now. So that is not the problem.

However… for the empty result in output 120, what might be going wrong? I see this in the logs.

2024-12-26 20:20:17,503 root ERROR: makeblastdb on /data/jwd05e/main/077/064/77064905/tmp/tmp9dxaimd3/fixed.input.fasta had the following messages STDERR: b’BLAST Database creation error: mdb_env_open: No locks available\n’ [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/blast/init.py:62]
2024-12-26 20:20:17,506 root ERROR: Could not build blast database, check error messages…cannot continue [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/mob_typer.py:343]

The output should have been “red” instead of green but empty, so that is something we could maybe improve on. For practical use – I think this tool should always output something – sorting the plasmid into a category – so any results that are empty likely had some processing problem and should be rerun.

Ping @bjoern.gruening – would you want to add anything else?

Hopefully this helps @cornek but please let us know if you have more questions! :slight_smile:

Dear Jenna,

Thank you for digging into this. Much appreciated! I can understand that when an empty file is produced that something went wrong and should not be trusted (even greater when the output is then in red instead of green). However, what about the cases where a results is obtained but apparently is not correct? Eg. a non-mobilizable vs conjugative plasmid as is the case for accession CP075568 (nrs 109 and 140)? Any clue there?

Hi @cornek

How well the classification “works” from a scientific perspective would be the same in Galaxy as anywhere else, since it is the same tool and same reference data. From reading about the reference data, I guess the result could be based on bad data from one of the publication sources the author pulled from. Or, you might need to different parameters, not sure.

Resources are linked on the form with more details about how this algorithm works. Scroll down to the top of the Help section to find these.

Thanks! :scientist:

@Jennaj, I get that everything relates to the reference data, but in this case the identical sequence yields an non-mobilizable plasmid and on repeat a conjugative plasmid. That still sound like a glitch in the software/process somewhere …

@jennaj

In addition, I noticed this in the log file when the non-mobilizable sequence was reported (109):

2024-12-26 15:49:26,309 root INFO: Testing ETE3 taxonomy db /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:459]
2024-12-26 15:49:26,356 root WARNING: Lock file is already removed by some other process or read-only file system [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:468]
2024-12-26 15:49:26,356 root WARNING: [Errno 2] No such file or directory: ‘/usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/ETE3_DB.lock’ [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:469]

Doesn’t this indicate that something went wrong and the result should be considered invalid? In the repeat action (140) where a conjugative plasmid was found on the same sequence indeed this warming did not occur.

Hi @cornek

Thank you so much for following up. I see the problem, and I think the tool should be failing at this step, not just produce a warning in the log!

So – there isn’t a good automatic way to pull out or to screen the logs since that should never really be needed.

Let’s ask one of the administrators of the EU server for advice at this point. This tool is a bit special and is only hosted at this specific server. Plus I haven’t seen this type of issue in a very long time without it being both rare and something important to report.

Hi @bjoern.gruening – would this be Ok to report here? Am I misunderstanding what might be going wrong? Thank you!

Tool → toolshed.g2.bx.psu.edu/repos/nml/mob_suite/mob_typer/3.0.3+galaxy0
Dev → Issues · phac-nml/mob-suite · GitHub (didn’t see anything similiar)

Original history → Galaxy
Testing history (subset) → Galaxy

  • data 2 (clear blast error, log has a warning)
  • data 4 (a less clear warning, users original run in data 109)
  • data 5 (correct run, users original run in data 140)

The input sequence is identical by checksum in the output reports, but the scientific results reported are very different. It seems the tool can have a problem, then report a warning, and happily report either a blank report or a flavor of “no hit”. I think the warnings in data 2 and data 4 should fail the tool instead. Thoughts?

Scientific results: the first is correct (data 5 result)

Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 Column 8 Column 9 Column 10 Column 11 Column 12 Column 13 Column 14 Column 15 Column 16 Column 17 Column 18 Column 19 Column 20 Column 21 Column 22 Column 23 Column 24 Column 25 Column 26
sample_id num_contigs size gc md5 rep_type(s) rep_type_accession(s) relaxase_type(s) relaxase_type_accession(s) mpf_type mpf_type_accession(s) orit_type(s) orit_accession(s) predicted_mobility mash_nearest_neighbor mash_neighbor_distance mash_neighbor_identification primary_cluster_id secondary_cluster_id predicted_host_range_overall_rank predicted_host_range_overall_name observed_host_range_ncbi_rank observed_host_range_ncbi_name reported_host_range_lit_rank reported_host_range_lit_name associated_pmid(s)
CP075568.1 1 - 58.62320425873259 fb79a1f75200ff3cd97f5feb196b071e IncP,rep_cluster_398 000167__AJ344068_00003,001720__CP003962 MOBF NC_003350_00125 MPF_T NC_003350_00124,NC_003350_00132,NC_003350_00133,NC_003350_00134,NC_003350_00138 MOBF NZ_AFYG01000108 conjugative MK671726 0.0794837 Pseudomonas mendocina AF916 AP924 multi-phylla Proteobacteria,Actinobacteria multi-phylla Proteobacteria,Actinobacteria phylum Proteobacteria 27895009; 25389419; 23980652; 54383; 28842132; 30619542; 20851899
sample_id num_contigs size gc md5 rep_type(s) rep_type_accession(s) relaxase_type(s) relaxase_type_accession(s) mpf_type mpf_type_accession(s) orit_type(s) orit_accession(s) predicted_mobility mash_nearest_neighbor mash_neighbor_distance mash_neighbor_identification primary_cluster_id secondary_cluster_id predicted_host_range_overall_rank predicted_host_range_overall_name observed_host_range_ncbi_rank observed_host_range_ncbi_name reported_host_range_lit_rank reported_host_range_lit_name associated_pmid(s)
CP075568 1 - 58.62320425873259 fb79a1f75200ff3cd97f5feb196b071e - - - - - - - - non-mobilizable MK671726 0.0794837 Pseudomonas mendocina AF916 AP924 - - - - - - -

Parts of the job logs that seem relevant

stderr data 4 eg “no hit” result stderr data 5 eg accurate
2024-12-26 15:49:26,309 root INFO: Testing ETE3 taxonomy db /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:459] 2025-01-06 12:07:29,084 root INFO: Testing ETE3 taxonomy db /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:459]
2024-12-26 15:49:26,356 root WARNING: Lock file is already removed by some other process or read-only file system [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:468] 2025-01-06 12:07:29,224 root INFO: Lock file removed. [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:466]
2024-12-26 15:49:26,356 root WARNING: [Errno 2] No such file or directory: ‘/usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/ETE3_DB.lock’ [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:469]

Thank you again for following up! Hopefully we can get to a good solution

@jennaj Again, thanks for your efforts, much appreciated!

@jennaj @bjoern.gruening I noticed that there is a newer version of MOB-Typer. Perhaps the issues I’m experiencing might already have been addressed there. Perhaps wise to first upgrade the tool to the latest version and see if things are working correct then?