Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection

About

Continuous monitoring of bipolar disorder agitation via voice biomarkers requires disentangling stable speaker traits from volatile affective states on resource-constrained edge devices. We introduce MP-IB, the first framework to treat mixed-precision quantization as an information bottleneck for clinical trait-state separation. The core insight is that numerical precision itself controls capacity: an FP16 trait head (1,024 bits) encodes speaker identity, while an INT4 state head (128 bits) captures agitation, yielding 8x information asymmetry without adversarial training. We augment this with Dynamic Precision Scheduling and Multi-Scale Temporal Fusion. On Bridge2AI-Voice (N=833, 4 sessions/participant, strict speaker-independent CV), MP-IB achieves rho = 0.117 (95\% CI: [0.089, 0.145], p=0.003 vs. chance), outperforming 94M-parameter WavLM-Adapter with in-domain SSL continuation (rho = -0.042), beta VAE disentanglement (rho = 0.089), and hand-crafted prosody (rho = 0.031) by 2.8--15.9 points absolute. Zero-shot transfer to CREMA-D achieves AUC=0.817. Identity leakage is suppressed to near-random (EER=0.42, MIA-AUC=0.52). End-to-end latency is 23.4 ms with a 617 KB footprint, enabling real-time monitoring on sub 20 dollar devices.

Joydeep Chandra• 2026

Related benchmarks

TaskDatasetResultRank
Agitation score predictionBridge2AI (speaker-independent CV)
Pearson Correlation (ρ)0.117
21
Agitation predictionBridge2AI-Voice 5-fold speaker-independent CV v3.0.0
Pearson Correlation (ρ)0.117
16
Identity LeakageBridge2AI 120-speaker
Top-1 Accuracy8.3
11
Anger detectionCREMA-D anger detection
AUC-ROC0.83
7
Clinical speech monitoringBridge2AI-Voice speaker-independent CV
Spearman Correlation (ρ)0.117
5
Showing 5 of 5 rows

Other info

Follow for update