Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

COMO: Closed-Loop Optical Molecule Recognition with Minimum Risk Training

About

Optical chemical structure recognition (OCSR) translates molecular images into machine-readable representations like SMILES strings or molecular graphs, but remains challenging in real-world documents due to inexhaustible variations in chemical structures, shorthand conventions, and visual noise. Most existing deep-learning-based approaches rely on teacher forcing with token-level Maximum Likelihood Estimation (MLE). This training paradigm suffers from exposure bias, as models are trained under ground-truth prefixes but must condition on their own previous predictions during inference. Moreover, token-level MLE objectives hinder the optimization towards molecular-level evaluation criteria such as chemical validity and structural similarity. Here we introduce Minimum Risk Training (MRT) to OCSR and propose COMO (Closed-loop Optical Molecule recOgnition), a closed-loop framework that mitigates exposure bias by directly optimizing over molecule-level, non-differentiable objectives, by iteratively sampling and evaluating the model's own predictions. Experiments on ten benchmarks including synthetic and real-world chemical diagrams from patent and scientific literature demonstrate that COMO substantially outperforms existing rule-based and learning-based methods with less training data. Ablation studies further show that MRT is architecture-agnostic, demonstrating its potential for broad application to end-to-end OCSR systems.

Zhuoqi Lyu, Qing Ke• 2026

Related benchmarks

TaskDatasetResultRank
Molecular structure recognitionJPO (450)
Accuracy89.1
19
Optical Chemical Structure RecognitionCLEF 992
Exact Match Accuracy95
12
Optical Chemical Structure RecognitionUOB (5740)
Exact Match Accuracy98.5
12
Optical Chemical Structure RecognitionUSPTO (5719)
Exact Match Accuracy93.4
12
Optical Chemical Structure RecognitionUSPTO-10K
Accuracy (Exact Match)96.1
12
Optical Chemical Structure RecognitionWildMol-10K
Exact Match Accuracy77.2
12
Optical Chemical Structure RecognitionIndigo 5719
Exact Match Accuracy98.8
9
Optical Chemical Structure RecognitionChemDraw 5719
Exact Match Accuracy96.5
9
Optical Chemical Structure RecognitionStaker 50000
Exact Match Accuracy87.5
9
Optical Chemical Structure RecognitionACS 331
Exact Match Accuracy88.2
9
Showing 10 of 10 rows

Other info

Follow for update