Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

"What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces

About

Large language models (LLMs) that have been trained on a corpus that includes large amount of code exhibit a remarkable ability to understand HTML code. As web interfaces are primarily constructed using HTML, we design an in-depth study to see how LLMs can be used to retrieve and locate important elements for a user given query (i.e. task description) in a web interface. In contrast with prior works, which primarily focused on autonomous web navigation, we decompose the problem as an even atomic operation - Can LLMs identify the important information in the web page for a user given query? This decomposition enables us to scrutinize the current capabilities of LLMs and uncover the opportunities and challenges they present. Our empirical experiments show that while LLMs exhibit a reasonable level of performance in retrieving important UI elements, there is still a substantial room for improvement. We hope our investigation will inspire follow-up works in overcoming the current challenges in this domain.

Faria Huq, Jeffrey P. Bigham, Nikolas Martelaro• 2023

Related benchmarks

TaskDatasetResultRank
Web agent tasksMind2Web Cross-Task
Element Accuracy58
49
Conversational web navigationMT-Mind2Web (Cross-Website)
Element Accuracy46.2
12
Conversational web navigationMT-Mind2Web Cross-Subdomain
Element Accuracy47.4
12
Showing 3 of 3 rows

Other info

Follow for update