Nougat: Neural Optical Understanding for Academic Documents
About
Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age, by bridging the gap between human-readable documents and machine-readable text. We release the models and code to accelerate future work on scientific text recognition.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Recognition | SROIE Task 2 (test) | F1 Score33.64 | 19 | |
| Document Image Retrieval | NL-DIR (test) | Recall@10.01 | 15 | |
| Document Retrieval | OHR-Bench Retrieval | Accuracy (Text)59.1 | 14 | |
| Document Text Generation | OHR-Bench Generation | Text Score36.7 | 14 | |
| Textual RAG | OHR-Bench (Overall) | TXT Score0.335 | 14 | |
| Markdown conversion | MD 82 images (val) | F1 Score86.71 | 5 | |
| Document-level OCR | FUNSD 50 images (test) | F1 Score55.35 | 5 | |
| Document-level OCR | SYN-L 200 images (val) | F1 Score66.76 | 5 | |
| Document-level OCR | CORD 100 images (test) | F1 Score1.57 | 5 | |
| Formula Recognition | Tiny-Doc-Math | BLEU58.97 | 3 |