Title :
Multi-player multi-armed bandits: Decentralized learning with IID rewards
Author :
Kalathil, Dileep ; Nayyar, Naumaan ; Jain, R.
Author_Institution :
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Abstract :
We consider the decentralized multi-armed bandit problem with distinct arms for each players. Each player can pick one arm at each time instant and can get a random reward from an unknown distribution with an unknown mean. The arms give different rewards to different players. If more than one player select the same arm, everyone gets a zero reward. There is no dedicated control channel for communication or coordination among the user. We propose an online learning algorithm called dUCB4 which achieves a near-O(log2 T). The motivation comes from opportunistic spectrum access by multiple secondary users in cognitive radio networks wherein they must pick among various wireless channels that look different to different users.
Keywords :
computational complexity; game theory; learning (artificial intelligence); multi-agent systems; statistical distributions; IID rewards; cognitive radio networks; dUCB4; decentralized learning; decentralized multiarmed bandit problem; multiagent systems; multiplayer multiarmed bandits; online learning algorithm; opportunistic spectrum access; random reward; unknown distribution; unknown mean; Abstracts; Algorithm design and analysis; Cognitive radio; Indexes; Radiation detectors; Vectors; Distributed adaptive control; multi-agent systems; multi-armed bandits; online learning;
Conference_Titel :
Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on
Conference_Location :
Monticello, IL
Print_ISBN :
978-1-4673-4537-8
DOI :
10.1109/Allerton.2012.6483307