Arctic-TILT. Business Document Understanding at Sub-Billion Scale
About
The vast portion of workloads employing LLMs involves answering questions grounded on PDF or scan content. We introduce the Arctic-TILT achieving accuracy on par with models 1000$\times$ its size on these use cases. It can be fine-tuned and deployed on a single 24GB GPU, lowering operational costs while processing Visually Rich Documents with up to 400k tokens. The model establishes state-of-the-art results on seven diverse Document Understanding benchmarks, as well as provides reliable confidence scores and quick inference, which are essential for processing files in large-scale or time-sensitive enterprise environments.
{\L}ukasz Borchmann, Micha{\l} Pietruszka, Wojciech Ja\'skowski, Dawid Jurkiewicz, Piotr Halama, Pawe{\l} J\'oziak, {\L}ukasz Garncarek, Pawe{\l} Liskowski, Karolina Szyndler, Andrzej Gretkowski, Julita O{\l}tusek, Gabriela Nowakowska, Artur Zaw{\l}ocki, {\L}ukasz Duhr, Pawe{\l} Dyda, Micha{\l} Turski• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-page Document Question Answering | MP-DocVQA (test) | ANLS0.8122 | 30 | |
| Document Question Answering | SlideVQA (test) | EM55.1 | 19 | |
| Document Question Answering | DUDE | ANLS0.5809 | 12 |
Showing 3 of 3 rows