Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

About

Recently, reinforcement learning (RL) has been widely applied during post-training for diffusion large language models (dLLMs) to enhance reasoning with block-wise semi-autoregressive generation. Block size has therefore become a vital factor in dLLMs, since it determines the parallel decoding granularity and affects the rollout trajectories during RL optimisation, e.g., GRPO. Instead of investigating the effect of block size during inference on individual domains, this paper studies block size from a domain conflict perspective for dLLM RL post-training in multi-domain scenarios. The main contributions are: (1) a formulation of domain block size conflict in multi-domain RL for dLLMs, which will largely affect the post-training effectiveness for rollout-based RL methods; (2) a novel dataset, Block-R1-41K is constructed with a best-improved training block size for each sample, which also induces a Block Size Conflict Score to quantitatively measure the domain conflict; (3) a new benchmark, Block-R1, for flexible RL post-training for dLLMs in both single and cross domain; and (4) a simple yet powerful cross-domain post-training method with sample-level best-improved training block sizes. Extensive experiments on 13 distinct datasets, 7 latest RL algorithms and diverse dLLM backbones are comprehensively covered in Block-R1. The benchmark is open-sourced at https://github.com/YanJiangJerry/Block-R1 with the dataset released at https://huggingface.co/datasets/YanJiangJerry/Block-R1-41K.

Yan Jiang, Ruihong Qiu, Zi Huang• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningCountdown--
252
Code GenerationHumanEval--
217
Mathematical ReasoningGSM8K--
204
General CapabilityMMLU
MMLU Accuracy62.22
74
Code GenerationKodCode--
58
Logical reasoningKK--
28
Code GenerationMBPP--
20
Mathematical ReasoningMATH500--
20
Puzzle SolvingSudoku--
20
Sudoku SolvingSudoku
Success Rate (pass@1)26.95
12
Showing 10 of 13 rows

Other info

Follow for update