A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

About

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.

Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust• 2023

Related benchmarks

Task	Dataset	Result
GUI Navigation	Multimodal-Mind2Web Cross-Website	Step Success Rate62.2	37
GUI Navigation	Multimodal-Mind2Web Cross-Task	Step Success Rate71.5	32
GUI Navigation	Multimodal-Mind2Web Cross-Domain	Step Success Rate67.1	32
Question Answering	WebSRC (dev)	EM76.91	26
Web automation	MiniWoB++ 56 tasks (test)	Success Rate85.6	15
Action Prediction	MIND2WEB Cross-Task 1.0	Element Accuracy60.6	11
Action Prediction	MIND2WEB Cross-Website 1.0	Element Accuracy47.6	11
Description Generation	Description Generation (test)	Accuracy98.9	9
Description Generation	Description Generation (dev)	Accuracy98.4	9
Offline Action Prediction	Mind2Web Cross-Domain v1.0 (test)	Element Accuracy50.2	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord