In-Place Feedback: Reliable Refinement for Multi-Turn Expert-LLM Collaboration
About
LLM-generated drafts often contain subtle factual or logical errors, yet prior work shows that models struggle to reliably integrate multi-turn feedback aimed at fixing them. We propose in-place feedback, an interaction paradigm in which the user directly edits the model's previous response and the model continues generation from the edited context. In-place feedback consistently outperforms standard multi-turn feedback across five reasoning-intensive benchmarks while requiring fewer tokens, and our fine-grained analysis shows that it applies corrections more reliably and propagates them to subsequent reasoning. A user study with domain experts refining LLM-generated summaries corroborates these findings: participants report higher final-output satisfaction and substantially lower fatigue with in-place feedback, and a mixed strategy combining in-place and multi-turn feedback scores highest on every measured dimension. These results suggest that editing errors directly is a more effective paradigm for expert-LLM collaboration.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reasoning | MMLU-Pro | Accuracy85.4 | 241 | |
| Mathematical Reasoning | MATH Hard | Accuracy92.8 | 198 | |
| Graduate-level Science Reasoning | GPQA | Accuracy69 | 121 | |
| Knowledge Reasoning | MMLU-Pro | Accuracy83.4 | 120 | |
| Logical reasoning | ZebraLogic (test) | Grid Accuracy92.2 | 90 | |
| Logical reasoning | ZebraLogic v1.0 (test) | Cell Accuracy97.7 | 90 | |
| Code Generation | LiveCodeBench | Accuracy88.6 | 84 | |
| Science Reasoning | GPQA | Accuracy (GPQA)58.7 | 72 |