MOB-Typer unstable results

cornek · January 6, 2025, 11:40am

Hi all, I’m experiencing unstable MOB-Typer results. Identical sequences are giving different results (conjugative vs non-mobilizable vs no results at all on the same accession numbers). How to know now what to trust?

jennaj · January 6, 2025, 7:44pm

Welcome, @cornek

Do you mean that the same exact inputs and parameters, when run though the tool at different times, produces different results? Or is something changing between the runs? Different tool version? Different parameters?

Would you like to share a history that demonstrates your observations for more feedback? How to do this is in the banner of this forum, also here → How to get faster help with your question

Xref → MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies - PMC

cornek · January 6, 2025, 8:31pm

Hi Jenna, thanks for replying to my problem!

You can access my history here: Galaxy. Unfortunately I have already removed several files involved but others are still there.

To be more specific about what I did: first uploaded 32 fasta files as a collection (nr 3) and ran the tool. Results are still available (nr 36). Some of the results are empty files. Some yielded conjugative plasmids.

Then I uploaded the fasta files individually and ran the tool again. Now some showed non-mobilizable plasmids, others were empty (not identical to first run). I repeated some with different results (e.g. 120 vs 135 and 119 vs 136).

Today I ran the samples again and now got results that were more in line with what I expected (138 till 158).

All runs were in the last 2 weeks using the same version of the tool with default parameters. File 165 is an excel file with an overview of the results that have been obtained.

Hope you can find the problem!

Kind regards, Corné

cornek · January 6, 2025, 8:45pm

Small correction: todays rerun was from the collection (but every file started individually instead as a collection).

jennaj · January 6, 2025, 10:13pm

Hi @cornek

Thanks for sharing the examples, very helpful!

This tool executes “per fasta file” whether in a collection or not. You’ll see a message like this when the collection is selected.

This is a batch mode input field. Individual jobs will be triggered for each dataset.

Now, some tools will have a toggle for “pooled” inputs and the message will change but that is not possible with the tool we are reviewing.

For batch use, you could instead input multiple by merging them into the same fasta file and using the first toggle at the top of the advanced settings.

Treat each input sequence as an independant plasmid?

Back to the difference in results for your example runs: These appear to be using slightly different inputs between the runs. Example of one of them. The other is similar.

Output 120 used input 20, named: LC586263.1.fasta
Output 135 used input 89, named: LC586263.fa

But the fasta files are the same, just loaded at different times. And both produce the same results now. So that is not the problem.

However… for the empty result in output 120, what might be going wrong? I see this in the logs.

2024-12-26 20:20:17,503 root ERROR: makeblastdb on /data/jwd05e/main/077/064/77064905/tmp/tmp9dxaimd3/fixed.input.fasta had the following messages STDERR: b’BLAST Database creation error: mdb_env_open: No locks available\n’ [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/blast/init.py:62]
2024-12-26 20:20:17,506 root ERROR: Could not build blast database, check error messages…cannot continue [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/mob_typer.py:343]

The output should have been “red” instead of green but empty, so that is something we could maybe improve on. For practical use – I think this tool should always output something – sorting the plasmid into a category – so any results that are empty likely had some processing problem and should be rerun.

Ping @bjoern.gruening – would you want to add anything else?

Hopefully this helps @cornek but please let us know if you have more questions!

cornek · January 7, 2025, 8:24am

Dear Jenna,

Thank you for digging into this. Much appreciated! I can understand that when an empty file is produced that something went wrong and should not be trusted (even greater when the output is then in red instead of green). However, what about the cases where a results is obtained but apparently is not correct? Eg. a non-mobilizable vs conjugative plasmid as is the case for accession CP075568 (nrs 109 and 140)? Any clue there?

jennaj · January 7, 2025, 9:12pm

Hi @cornek

How well the classification “works” from a scientific perspective would be the same in Galaxy as anywhere else, since it is the same tool and same reference data. From reading about the reference data, I guess the result could be based on bad data from one of the publication sources the author pulled from. Or, you might need to different parameters, not sure.

Resources are linked on the form with more details about how this algorithm works. Scroll down to the top of the Help section to find these.

GitHub - phac-nml/mob-suite: MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
See the Issues section ^^ since I see other comments about the data interpretation. You could ask the author as well about your result case.
There are two publications you can review, too.
Explore the output report itself. Maybe you can notice where the public annotation was incorrect and report that to the author for curation/improvement purposes. Any changes at the source would eventually flow out to Galaxy.

Thanks!

cornek · January 7, 2025, 9:25pm

@Jennaj, I get that everything relates to the reference data, but in this case the identical sequence yields an non-mobilizable plasmid and on repeat a conjugative plasmid. That still sound like a glitch in the software/process somewhere …

cornek · January 8, 2025, 8:05am

@jennaj

In addition, I noticed this in the log file when the non-mobilizable sequence was reported (109):

2024-12-26 15:49:26,309 root INFO: Testing ETE3 taxonomy db /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:459]
2024-12-26 15:49:26,356 root WARNING: Lock file is already removed by some other process or read-only file system [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:468]
2024-12-26 15:49:26,356 root WARNING: [Errno 2] No such file or directory: ‘/usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/ETE3_DB.lock’ [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:469]

Doesn’t this indicate that something went wrong and the result should be considered invalid? In the repeat action (140) where a conjugative plasmid was found on the same sequence indeed this warming did not occur.

jennaj · January 8, 2025, 10:55pm

Hi @cornek

Thank you so much for following up. I see the problem, and I think the tool should be failing at this step, not just produce a warning in the log!

So – there isn’t a good automatic way to pull out or to screen the logs since that should never really be needed.

Let’s ask one of the administrators of the EU server for advice at this point. This tool is a bit special and is only hosted at this specific server. Plus I haven’t seen this type of issue in a very long time without it being both rare and something important to report.

Hi @bjoern.gruening – would this be Ok to report here? Am I misunderstanding what might be going wrong? Thank you!

Tool → toolshed.g2.bx.psu.edu/repos/nml/mob_suite/mob_typer/3.0.3+galaxy0
Dev → GitHub · Where software is built (didn’t see anything similiar)

Original history → Galaxy
Testing history (subset) → Galaxy

data 2 (clear blast error, log has a warning)
data 4 (a less clear warning, users original run in data 109)
data 5 (correct run, users original run in data 140)

The input sequence is identical by checksum in the output reports, but the scientific results reported are very different. It seems the tool can have a problem, then report a warning, and happily report either a blank report or a flavor of “no hit”. I think the warnings in data 2 and data 4 should fail the tool instead. Thoughts?

Scientific results: the first is correct (data 5 result)

Column 1	Column 2	Column 3	Column 4	Column 5	Column 6	Column 7	Column 8	Column 9	Column 10	Column 11	Column 12	Column 13	Column 14	Column 15	Column 16	Column 17	Column 18	Column 19	Column 20	Column 21	Column 22	Column 23	Column 24	Column 25	Column 26
sample_id	num_contigs	size	gc	md5	rep_type(s)	rep_type_accession(s)	relaxase_type(s)	relaxase_type_accession(s)	mpf_type	mpf_type_accession(s)	orit_type(s)	orit_accession(s)	predicted_mobility	mash_nearest_neighbor	mash_neighbor_distance	mash_neighbor_identification	primary_cluster_id	secondary_cluster_id	predicted_host_range_overall_rank	predicted_host_range_overall_name	observed_host_range_ncbi_rank	observed_host_range_ncbi_name	reported_host_range_lit_rank	reported_host_range_lit_name	associated_pmid(s)
CP075568.1	1	-	58.62320425873259	fb79a1f75200ff3cd97f5feb196b071e	IncP,rep_cluster_398	000167__AJ344068_00003,001720__CP003962	MOBF	NC_003350_00125	MPF_T	NC_003350_00124,NC_003350_00132,NC_003350_00133,NC_003350_00134,NC_003350_00138	MOBF	NZ_AFYG01000108	conjugative	MK671726	0.0794837	Pseudomonas mendocina	AF916	AP924	multi-phylla	Proteobacteria,Actinobacteria	multi-phylla	Proteobacteria,Actinobacteria	phylum	Proteobacteria	27895009; 25389419; 23980652; 54383; 28842132; 30619542; 20851899
sample_id	num_contigs	size	gc	md5	rep_type(s)	rep_type_accession(s)	relaxase_type(s)	relaxase_type_accession(s)	mpf_type	mpf_type_accession(s)	orit_type(s)	orit_accession(s)	predicted_mobility	mash_nearest_neighbor	mash_neighbor_distance	mash_neighbor_identification	primary_cluster_id	secondary_cluster_id	predicted_host_range_overall_rank	predicted_host_range_overall_name	observed_host_range_ncbi_rank	observed_host_range_ncbi_name	reported_host_range_lit_rank	reported_host_range_lit_name	associated_pmid(s)
CP075568	1	-	58.62320425873259	fb79a1f75200ff3cd97f5feb196b071e	-	-	-	-	-	-	-	-	non-mobilizable	MK671726	0.0794837	Pseudomonas mendocina	AF916	AP924	-	-	-	-	-	-	-

Parts of the job logs that seem relevant

stderr data 4 eg “no hit” result	stderr data 5 eg accurate
2024-12-26 15:49:26,309 root INFO: Testing ETE3 taxonomy db /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:459]	2025-01-06 12:07:29,084 root INFO: Testing ETE3 taxonomy db /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:459]
2024-12-26 15:49:26,356 root WARNING: Lock file is already removed by some other process or read-only file system [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:468]	2025-01-06 12:07:29,224 root INFO: Lock file removed. [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:466]
2024-12-26 15:49:26,356 root WARNING: [Errno 2] No such file or directory: ‘/usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/ETE3_DB.lock’ [in /usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:469]

Thank you again for following up! Hopefully we can get to a good solution

cornek · January 9, 2025, 7:54am

@jennaj Again, thanks for your efforts, much appreciated!

@jennaj @bjoern.gruening I noticed that there is a newer version of MOB-Typer. Perhaps the issues I’m experiencing might already have been addressed there. Perhaps wise to first upgrade the tool to the latest version and see if things are working correct then?

kbessonov1984 · February 19, 2025, 4:27pm

In the newer 3.1.9 release we have improved the stability of the output results by ordering contigs. This randomness was due to contig sorting algorithm and should be solved in release 3.1.9 that I am currently working on that should be released the week of Feb 24th if all goes well

kbessonov1984 · February 19, 2025, 4:31pm

Blockquote
WARNING: [Errno 2] No such file or directory: ‘/usr/local/tools/_conda/envs/__mob_suite@3.0.3/lib/python3.8/site-packages/mob_suite/databases/ETE3_DB.lock

This error message is OK, it just means that the database lock file is gone which is normal because otherwise the tool will wait for 15 min before trying to proceed. This was done to prevent previous versions of the tool to kill the server with the database initialization step if multiple samples were initialized. Now the database initialization step together with the ete3 taxonomy database are doing during the conda install stage and this lock/release database mechanism is redundant. Please disregard this message

kbessonov1984 · February 20, 2025, 5:10pm

New Pull Request has been submitted for the updated Galaxy MOB-Suite v3.1.9 recipe that brings more result robustness and biomarker report for MOB-Typer allowing you to find the key marker coordinates in a plasmid that could be used for plasmid map rendering purposes or easier parsing. Mob suite v3.1.9 Galaxy tool release by kbessonov1984 · Pull Request #273 · phac-nml/galaxy_tools · GitHub