Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

$\mathcal{B}^{3}$-Net: Controlled Posterior Bridge Learning for Multi-Task Dense Prediction

About

Multi-task dense prediction solves complementary pixel-level tasks in a unified model, such as semantic segmentation, depth estimation, surface normal estimation, and edge detection. Existing decoder-side interactions use attention, prompts, routing, diffusion, Mamba, or bridge features to exchange task evidence, but most of them organize this evidence implicitly. They usually fuse task features by similarity or affinity, without explicitly modeling that evidence reliability varies across tasks and spatial locations. As a result, unreliable evidence may contaminate the shared representation and intensify negative transfer. We propose $\mathcal{B}^{3}$-Net, a controlled posterior bridge learning framework for multi-task dense prediction. Our method decomposes decoder-side interaction into reliability estimation, posterior bridge construction, and bounded redistribution. The Precision Field Estimator estimates patch-wise evidence precision from task-reference alignment and local variation. The Posterior Bridge Operator builds a precision-weighted posterior bridge through heteroscedastic evidence fusion, yielding a shared state more reliable than uniform or heuristic mixtures. The Contractive Dispatch Operator redistributes the bridge to each task branch through a bounded update, reducing uncontrolled feature injection. Experiments on NYUD-v2, PASCAL-Context, and Cityscapes show that $\mathcal{B}^{3}$-Net achieves competitive or superior trade-offs over representative CNN-, Transformer-, diffusion-, Mamba-, and bridge-feature-based methods. Backbone-matched comparisons and extensive analyses further verify that the gains arise from controlled posterior bridge learning rather than backbone capacity or decoder scale.

Meihua Zhou, Li Yang• 2026

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes
mIoU93.95
494
Depth EstimationNYU V2
RMSE0.4587
167
Semantic segmentationNYUD v2
mIoU57.78
150
Saliency DetectionPascal Context
maxF Score86.11
45
Surface Normal EstimationPascal Context
Mean Error (MAE)13.4
45
Semantic segmentationPascal Context
mIoU80.81
42
Surface Normal EstimationNYUD
mErr17.22
38
Human ParsingPascal Context
mIoU73.73
35
Edge DetectionNYUD v2
ODS83.18
33
Edge DetectionPascal Context
ODS F-score81.18
17
Showing 10 of 11 rows

Other info

Follow for update