"What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces

About

Large language models (LLMs) that have been trained on a corpus that includes large amount of code exhibit a remarkable ability to understand HTML code. As web interfaces are primarily constructed using HTML, we design an in-depth study to see how LLMs can be used to retrieve and locate important elements for a user given query (i.e. task description) in a web interface. In contrast with prior works, which primarily focused on autonomous web navigation, we decompose the problem as an even atomic operation - Can LLMs identify the important information in the web page for a user given query? This decomposition enables us to scrutinize the current capabilities of LLMs and uncover the opportunities and challenges they present. Our empirical experiments show that while LLMs exhibit a reasonable level of performance in retrieving important UI elements, there is still a substantial room for improvement. We hope our investigation will inspire follow-up works in overcoming the current challenges in this domain.

Faria Huq, Jeffrey P. Bigham, Nikolas Martelaro• 2023

Related benchmarks

Task	Dataset	Result
Web agent tasks	Mind2Web Cross-Task	Step Success Rate51.3	64
Conversational web navigation	MT-Mind2Web (Cross-Website)	Element Accuracy46.2	12
Conversational web navigation	MT-Mind2Web Cross-Subdomain	Element Accuracy47.4	12

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord