SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework

About

Comparative analysis of adaptive immune repertoires at population scale is hampered by two practical bottlenecks: the near-quadratic cost of pairwise affinity evaluations and dataset imbalances that obscure clinically important minority clonotypes. We introduce SubQuad, an end-to-end pipeline that addresses these challenges by combining antigen-aware, near-subquadratic retrieval with GPU-accelerated affinity kernels, learned multimodal fusion, and fairness-constrained clustering. The system employs compact MinHash prefiltering to sharply reduce candidate comparisons, a differentiable gating module that adaptively weights complementary alignment and embedding channels on a per-pair basis, and an automated calibration routine that enforces proportional representation of rare antigen-specific subgroups. On large viral and tumor repertoires SubQuad achieves measured gains in throughput and peak memory usage while preserving or improving recall@k, cluster purity, and subgroup equity. By co-designing indexing, similarity fusion, and equity-aware objectives, SubQuad offers a scalable, bias-aware platform for repertoire mining and downstream translational tasks such as vaccine target prioritization and biomarker discovery.

Rong Fu, Zijian Zhang, Kun Liu, Jiekai Wu, Xianda Li, Simon Fong• 2026

Related benchmarks

Task	Dataset	Result
T-Cell Receptor (TCR) Similarity Search	VDJdb random slices (10K sequences)	Recall (AUC)0.985	9
Rare Subpopulation Retrieval	McPAS-TCR database (test)	Recall@100 (Rare)0.594	3
TCR Antigen Classification	VDJdb 2024.03 (test)	Macro-F1 (Antigen)71.2	3
Similarity search (affinity computation)	10^7 sequences extrapolated from 10K-sequence component-level benchmark	Kernel throughput (k seq/s)89.4	2

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord