Arctic-TILT. Business Document Understanding at Sub-Billion Scale

About

The vast portion of workloads employing LLMs involves answering questions grounded on PDF or scan content. We introduce the Arctic-TILT achieving accuracy on par with models 1000$\times$ its size on these use cases. It can be fine-tuned and deployed on a single 24GB GPU, lowering operational costs while processing Visually Rich Documents with up to 400k tokens. The model establishes state-of-the-art results on seven diverse Document Understanding benchmarks, as well as provides reliable confidence scores and quick inference, which are essential for processing files in large-scale or time-sensitive enterprise environments.

{\L}ukasz Borchmann, Micha{\l} Pietruszka, Wojciech Ja\'skowski, Dawid Jurkiewicz, Piotr Halama, Pawe{\l} J\'oziak, {\L}ukasz Garncarek, Pawe{\l} Liskowski, Karolina Szyndler, Andrzej Gretkowski, Julita O{\l}tusek, Gabriela Nowakowska, Artur Zaw{\l}ocki, {\L}ukasz Duhr, Pawe{\l} Dyda, Micha{\l} Turski• 2024

Related benchmarks

Task	Dataset	Result
Multi-page Document Question Answering	MP-DocVQA (test)	ANLS0.8122	30
Document Question Answering	SlideVQA (test)	EM55.1	19
Document Question Answering	DUDE	ANLS0.5809	12

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord