Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition
About
Optical Chemical Structure Recognition (OCSR) is critical for converting 2D molecular diagrams from printed literature into machine-readable formats. While Vision-Language Models have shown promise in end-to-end OCR tasks, their direct application to OCSR remains challenging, and direct full-parameter supervised fine-tuning often fails. In this work, we adapt DeepSeek-OCR-2 for molecular optical recognition by formulating the task as image-conditioned SMILES generation. To overcome training instabilities, we propose a two-stage progressive supervised fine-tuning strategy: starting with parameter-efficient LoRA and transitioning to selective full-parameter fine-tuning with split learning rates. We train our model on a large-scale corpus combining synthetic renderings from PubChem and realistic patent images from USPTO-MOL to improve coverage and robustness. Our fine-tuned model, MolSeek-OCR, demonstrates competitive capabilities, achieving exact matching accuracies comparable to the best-performing image-to-sequence model. However, it remains inferior to state-of-the-art image-to-graph modelS. Furthermore, we explore reinforcement-style post-training and data-curation-based refinement, finding that they fail to improve the strict sequence-level fidelity required for exact SMILES matching.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecular structure recognition | UOB Synthetic | Exact Matching Accuracy72.6 | 11 | |
| Molecular structure recognition | CLEF Synthetic | Exact Match Accuracy63.3 | 10 | |
| Molecular structure recognition | USPTO Realistic | Exact Match Accuracy61 | 10 | |
| Molecular structure recognition | CLEFp Perturbed | Exact Matching Accuracy70.7 | 9 | |
| Molecular structure recognition | Staker Realistic | Exact Matching Accuracy50.5 | 9 | |
| Molecular structure recognition | Indigo Synthetic | Exact Matching Accuracy74.3 | 9 | |
| Molecular structure recognition | ACS Realistic | Exact Matching Accuracy29.9 | 9 | |
| Molecular structure recognition | ChemDraw Synthetic | Exact Matching Accuracy72.2 | 9 | |
| Molecular structure recognition | USPTOp Perturbed | Exact Matching Accuracy65.6 | 8 | |
| Molecular structure recognition | Stakerp Perturbed | Exact Matching Accuracy31.2 | 8 |