SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework
About
Comparative analysis of adaptive immune repertoires at population scale is hampered by two practical bottlenecks: the near-quadratic cost of pairwise affinity evaluations and dataset imbalances that obscure clinically important minority clonotypes. We introduce SubQuad, an end-to-end pipeline that addresses these challenges by combining antigen-aware, near-subquadratic retrieval with GPU-accelerated affinity kernels, learned multimodal fusion, and fairness-constrained clustering. The system employs compact MinHash prefiltering to sharply reduce candidate comparisons, a differentiable gating module that adaptively weights complementary alignment and embedding channels on a per-pair basis, and an automated calibration routine that enforces proportional representation of rare antigen-specific subgroups. On large viral and tumor repertoires SubQuad achieves measured gains in throughput and peak memory usage while preserving or improving recall@k, cluster purity, and subgroup equity. By co-designing indexing, similarity fusion, and equity-aware objectives, SubQuad offers a scalable, bias-aware platform for repertoire mining and downstream translational tasks such as vaccine target prioritization and biomarker discovery.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| T-Cell Receptor (TCR) Similarity Search | VDJdb random slices (10K sequences) | Recall (AUC)0.985 | 9 | |
| Rare Subpopulation Retrieval | McPAS-TCR database (test) | Recall@100 (Rare)0.594 | 3 | |
| TCR Antigen Classification | VDJdb 2024.03 (test) | Macro-F1 (Antigen)71.2 | 3 | |
| Similarity search (affinity computation) | 10^7 sequences extrapolated from 10K-sequence component-level benchmark | Kernel throughput (k seq/s)89.4 | 2 |