Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs

About

We study a class of optimization problems motivated by automating the design and update of AI systems like coding assistants, robots, and copilots. AutoDiff frameworks, like PyTorch, enable efficient end-to-end optimization of differentiable systems. However, general computational workflows can be non-differentiable and involve rich feedback (e.g. console output or user's responses), heterogeneous parameters (e.g. prompts, codes), and intricate objectives (beyond maximizing a score). We investigate end-to-end generative optimization -- using generative models such as LLMs within the optimizer for automatic updating of general computational workflows. We discover that workflow execution traces are akin to back-propagated gradients in AutoDiff and can provide key information to interpret feedback for efficient optimization. Formally, we frame a new mathematical setup, Optimization with Trace Oracle (OPTO). In OPTO, an optimizer receives an execution trace along with feedback on the computed output and updates parameters iteratively. We provide a Python library, Trace, that efficiently converts a workflow optimization problem into an OPTO instance using PyTorch-like syntax. Using Trace, we develop a general LLM-based generative optimizer called OptoPrime. In empirical studies, we find that OptoPrime is capable of first-order numerical optimization, prompt optimization, hyper-parameter tuning, robot controller design, code debugging, etc., and is often competitive with specialized optimizers for each domain. We envision Trace as an open research platform for devising novel generative optimizers and developing the next generation of interactive learning agents. Website: https://microsoft.github.io/Trace/.

Ching-An Cheng, Allen Nie, Adith Swaminathan• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Success Rate82.5
16
Prompt OptimizationHotpotQA, IFBench, HoVer, PUPA, AIME, and LiveBench-Math 2018-2025 (test)
HotpotQA Score60.33
8
LLM Workflow OptimizationBIG-Bench Hard (BBH) (test)
BBH Overall Accuracy78.6
6
Question AnsweringGoogle-proof QA
Success Rate59.6
4
College PhysicsMMLU College Physics 1.0 (test)
Success Rate94.1
4
CountingBig-Bench Hard Counting
Success Rate89.4
4
Machine LearningMMLU Machine Learning 1.0 (test)
Accuracy86.6
4
Word SortingBig-Bench Hard Word Sorting
Success Rate71.6
4
Showing 8 of 8 rows

Other info

Code

Follow for update