Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

About

Determining a winner among a set of items using active pairwise comparisons under a limited budget is a challenging problem in preference-based learning. The goal of this study is to implement and evaluate the PARWiS algorithm, which shows spectral ranking and disruptive pair selection to identify the best item under shoestring budgets. This work have extended the PARWiS with a contextual variant (Contextual PARWiS) and a reinforcement learning-based variant (RL PARWiS), comparing them against baselines, including Double Thompson Sampling and a random selection strategy. This evaluation spans synthetic and real-world datasets (Jester and MovieLens), using budgets of 40, 60, and 80 comparisons for 20 items. The performance is measured through recovery fraction, true rank of reported winner, reported rank of true winner, and cumulative regret, alongside the separation metric \(\Delta_{1,2}\). Results show that PARWiS and RL PARWiS outperform baselines across all datasets, particularly in the Jester dataset with a higher \(\Delta_{1,2}\), while performance gaps narrow in the more challenging MovieLens dataset with a smaller \(\Delta_{1,2}\). Contextual PARWiS shows comparable performance to PARWiS, indicating that contextual features may require further tuning to provide significant benefits.

Shailendra Bhandari• 2026

Related benchmarks

TaskDatasetResultRank
Winner DeterminationSynthetic
Cumulative Regret13.067
15
Winner DeterminationJester
Cumulative Regret10.167
15
Winner DeterminationMovieLens
Cumulative Regret18.1
15
Best Arm IdentificationSynthetic
True Rank of Reported Winner3.533
15
Best Arm IdentificationJester
True Rank of Reported Winner2.067
15
Best Arm IdentificationMovieLens
True Rank6.633
15
Dueling BanditsSynthetic
Recovery Fraction36.7
15
Dueling BanditsMovieLens
Recovery Fraction16.7
15
Dueling BanditsJester
Recovery Fraction46.7
15
Dueling BanditSynthetic delta_1,2 = 0.0152 +/- 0.0190
Rank of True Winner4.233
9
Showing 10 of 12 rows

Other info

Follow for update