Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Fair Algorithms for Multi-Agent Multi-Armed Bandits

About

We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are $N$ agents and $K$ arms, and pulling an arm generates a (possibly different) stochastic reward for each agent. Unlike the classical multi-armed bandit problem, the goal is not to learn the "best arm"; indeed, each agent may perceive a different arm to be the best for her personally. Instead, we seek to learn a fair distribution over the arms. Drawing on a long line of research in economics and computer science, we use the Nash social welfare as our notion of fairness. We design multi-agent variants of three classic multi-armed bandit algorithms and show that they achieve sublinear regret, which is now measured in terms of the lost Nash social welfare.

Safwan Hossain, Evi Micha, Nisarg Shah• 2020

Related benchmarks

TaskDatasetResultRank
Multiobjective OptimizationDTLZ2 (train)
IGD0.0467
28
Multi-Objective OptimizationWFG4 M=3 (train)
IGD0.0717
4
Multi-Objective OptimizationWFG8 M=3 (test)
IGD0.0961
4
Multi-Objective OptimizationWFG6 M=5 (train)
IGD0.3359
4
Multi-Objective OptimizationDTLZ4 M=3 (test)
IGD0.0601
4
Multi-Objective OptimizationWFG5 M=3 (test)
IGD0.0612
4
Multi-Objective OptimizationWFG5 M=5 (test)
IGD0.3036
4
Multi-Objective OptimizationWFG8 M=5 (test)
IGD0.3956
4
Multi-Objective OptimizationWFG4 M=5 (train)
IGD0.2859
4
Multi-Objective OptimizationWFG4 M=7 (train)
IGD0.3868
4
Showing 10 of 24 rows

Other info

Follow for update