Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

About

Recently, reinforcement learning (RL) has been widely applied during post-training for diffusion large language models (dLLMs) to enhance reasoning with block-wise semi-autoregressive generation. Block size has therefore become a vital factor in dLLMs, since it determines the parallel decoding granularity and affects the rollout trajectories during RL optimisation, e.g., GRPO. Instead of investigating the effect of block size during inference on individual domains, this paper studies block size from a domain conflict perspective for dLLM RL post-training in multi-domain scenarios. The main contributions are: (1) a formulation of domain block size conflict in multi-domain RL for dLLMs, which will largely affect the post-training effectiveness for rollout-based RL methods; (2) a novel dataset, Block-R1-41K is constructed with a best-improved training block size for each sample, which also induces a Block Size Conflict Score to quantitatively measure the domain conflict; (3) a new benchmark, Block-R1, for flexible RL post-training for dLLMs in both single and cross domain; and (4) a simple yet powerful cross-domain post-training method with sample-level best-improved training block sizes. Extensive experiments on 13 distinct datasets, 7 latest RL algorithms and diverse dLLM backbones are comprehensively covered in Block-R1. The benchmark is open-sourced at https://github.com/YanJiangJerry/Block-R1 with the dataset released at https://huggingface.co/datasets/YanJiangJerry/Block-R1-41K.

Yan Jiang, Ruihong Qiu, Zi Huang• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	Countdown	--	252
Code Generation	HumanEval	--	224
Mathematical Reasoning	GSM8K	--	220
Code Generation	KodCode	--	94
General Capability	MMLU	MMLU Accuracy62.22	74
Puzzle Solving	Sudoku	--	42
Logical reasoning	KK	--	28
Code Generation	MBPP	--	20
Mathematical Reasoning	MATH500	--	20
Sudoku Solving	Sudoku	Success Rate (pass@1)26.95	12

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord