Fair Algorithms for Multi-Agent Multi-Armed Bandits
About
We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are $N$ agents and $K$ arms, and pulling an arm generates a (possibly different) stochastic reward for each agent. Unlike the classical multi-armed bandit problem, the goal is not to learn the "best arm"; indeed, each agent may perceive a different arm to be the best for her personally. Instead, we seek to learn a fair distribution over the arms. Drawing on a long line of research in economics and computer science, we use the Nash social welfare as our notion of fairness. We design multi-agent variants of three classic multi-armed bandit algorithms and show that they achieve sublinear regret, which is now measured in terms of the lost Nash social welfare.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multiobjective Optimization | DTLZ2 (train) | IGD0.0467 | 28 | |
| Multi-Objective Optimization | WFG4 M=3 (train) | IGD0.0717 | 4 | |
| Multi-Objective Optimization | WFG8 M=3 (test) | IGD0.0961 | 4 | |
| Multi-Objective Optimization | WFG6 M=5 (train) | IGD0.3359 | 4 | |
| Multi-Objective Optimization | DTLZ4 M=3 (test) | IGD0.0601 | 4 | |
| Multi-Objective Optimization | WFG5 M=3 (test) | IGD0.0612 | 4 | |
| Multi-Objective Optimization | WFG5 M=5 (test) | IGD0.3036 | 4 | |
| Multi-Objective Optimization | WFG8 M=5 (test) | IGD0.3956 | 4 | |
| Multi-Objective Optimization | WFG4 M=5 (train) | IGD0.2859 | 4 | |
| Multi-Objective Optimization | WFG4 M=7 (train) | IGD0.3868 | 4 |