COMO: Closed-Loop Optical Molecule Recognition with Minimum Risk Training
About
Optical chemical structure recognition (OCSR) translates molecular images into machine-readable representations like SMILES strings or molecular graphs, but remains challenging in real-world documents due to inexhaustible variations in chemical structures, shorthand conventions, and visual noise. Most existing deep-learning-based approaches rely on teacher forcing with token-level Maximum Likelihood Estimation (MLE). This training paradigm suffers from exposure bias, as models are trained under ground-truth prefixes but must condition on their own previous predictions during inference. Moreover, token-level MLE objectives hinder the optimization towards molecular-level evaluation criteria such as chemical validity and structural similarity. Here we introduce Minimum Risk Training (MRT) to OCSR and propose COMO (Closed-loop Optical Molecule recOgnition), a closed-loop framework that mitigates exposure bias by directly optimizing over molecule-level, non-differentiable objectives, by iteratively sampling and evaluating the model's own predictions. Experiments on ten benchmarks including synthetic and real-world chemical diagrams from patent and scientific literature demonstrate that COMO substantially outperforms existing rule-based and learning-based methods with less training data. Ablation studies further show that MRT is architecture-agnostic, demonstrating its potential for broad application to end-to-end OCSR systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular structure recognition | JPO (450) | Accuracy89.1 | 19 | |
| Optical Chemical Structure Recognition | CLEF 992 | Exact Match Accuracy95 | 12 | |
| Optical Chemical Structure Recognition | UOB (5740) | Exact Match Accuracy98.5 | 12 | |
| Optical Chemical Structure Recognition | USPTO (5719) | Exact Match Accuracy93.4 | 12 | |
| Optical Chemical Structure Recognition | USPTO-10K | Accuracy (Exact Match)96.1 | 12 | |
| Optical Chemical Structure Recognition | WildMol-10K | Exact Match Accuracy77.2 | 12 | |
| Optical Chemical Structure Recognition | Indigo 5719 | Exact Match Accuracy98.8 | 9 | |
| Optical Chemical Structure Recognition | ChemDraw 5719 | Exact Match Accuracy96.5 | 9 | |
| Optical Chemical Structure Recognition | Staker 50000 | Exact Match Accuracy87.5 | 9 | |
| Optical Chemical Structure Recognition | ACS 331 | Exact Match Accuracy88.2 | 9 |