Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Asynchronous Reasoning: Training-Free Interactive Thinking LLMs

About

Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or embodied assistants require an LLM agent to respond and adapt to additional information in real time, which is incompatible with sequential interactions. In contrast, humans can listen, think, and act asynchronously: we begin thinking about the problem while reading it and continue thinking while formulating the answer. In this work, we augment LLMs capable of reasoning to operate in a similar way without additional training. Our method uses the properties of positional embeddings to enable LLMs built for sequential generation to simultaneously think, listen, and write outputs. We evaluate our approach on math, commonsense, and safety reasoning: it allows models to generate accurate thinking-augmented answers while reducing time to first non-thinking token from minutes to ${\le}$ 5s and the overall real-time delays by up to $12{\times}$.

George Yakushev, Nataliia Babina, Masoud Vahid Dastgerdi, Vyacheslav Zhdanovskiy, Denis Kuznedelev, Alina Shutova, Max Ryabinin• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2025
Acc68
54
Logic reasoningZebraLogic
Accuracy93
15
Multistep ReasoningSpokenMQA multistep reasoning
Accuracy81
6
Showing 3 of 3 rows

Other info

Follow for update