Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UQE: A Query Engine for Unstructured Databases

About

Analytics on structured data is a mature field with many successful methods. However, most real world data exists in unstructured form, such as images and conversations. We investigate the potential of Large Language Models (LLMs) to enable unstructured data analytics. In particular, we propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections. This engine accepts queries in a Universal Query Language (UQL), a dialect of SQL that provides full natural language flexibility in specifying conditions and operators. The new engine leverages the ability of LLMs to conduct analysis of unstructured data, while also allowing us to exploit advances in sampling and optimization techniques to achieve efficient and accurate query execution. In addition, we borrow techniques from classical compiler theory to better orchestrate the workflow between sampling methods and foundation model calls. We demonstrate the efficiency of UQE on data analytics across different modalities, including images, dialogs and reviews, across a range of useful query types, including conditional aggregation, semantic retrieval and abstraction aggregation.

Hanjun Dai, Bethany Yixin Wang, Xingchen Wan, Bo Dai, Sherry Yang, Azade Nova, Pengcheng Yin, Phitchaya Mangpo Phothilimthana, Charles Sutton, Dale Schuurmans• 2024

Related benchmarks

TaskDatasetResultRank
Semantic RetrievalAirDialog v1 (test)
Avg Cost per Query0.01
20
Semantic RetrievalABCD v1 (test)
Avg Cost per Query0.03
10
Semantic RetrievalClevr v1 (test)
Avg Cost per Query0.08
10
Semantic RetrievalAudioMnist
F1 Score92.2
9
Semantic RetrievalIMDB v1 (test)
Avg Cost per Query0.02
5
Conditional abstraction and aggregationAirDialog
Cost0.04
3
Conditional abstraction and aggregationABCD
Operational Cost0.07
3
Conditional aggregationCLEVR (test)
Runtime (seconds)3.13
2
Conditional aggregationABCD (test)
Runtime (s)3.34
2
Semantic RetrievalCLEVR (test)
Semantic Retrieval Latency (s)46
2
Showing 10 of 15 rows

Other info

Follow for update