Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

About

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.

Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust• 2023

Related benchmarks

TaskDatasetResultRank
GUI NavigationMultimodal-Mind2Web Cross-Website
Step Success Rate62.2
32
GUI NavigationMultimodal-Mind2Web Cross-Task
Step Success Rate71.5
27
GUI NavigationMultimodal-Mind2Web Cross-Domain
Step Success Rate67.1
27
Question AnsweringWebSRC (dev)
EM76.91
26
Web automationMiniWoB++ 56 tasks (test)
Success Rate85.6
15
Action PredictionMIND2WEB Cross-Task 1.0
Element Accuracy60.6
11
Action PredictionMIND2WEB Cross-Website 1.0
Element Accuracy47.6
11
Description GenerationDescription Generation (test)
Accuracy98.9
9
Description GenerationDescription Generation (dev)
Accuracy98.4
9
Offline Action PredictionMind2Web Cross-Domain v1.0 (test)
Element Accuracy50.2
4
Showing 10 of 10 rows

Other info

Follow for update