Publication date: Available online 26 April 2014
Source:Physical Communication
Author(s): Chaima Dhahri , Tomoaki Ohtsuki
This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this problem as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modelled as an arbitrary finite-state Markov chain with different state space and statistics. Each user tries to learn the best channel that maximizes its capacity and reduces its number of handovers. This is a classic exploration/exploitation problem, where the reward of each channel is considered to be Markovian. In addition, the reward process is restless because the state of each Markov chain evolves independently of the user action. This leads to a decentralized restless bandit problem. To solve this problem, we refer to the decentralized restless upper confidence bound (RUCB) algorithm that achieves a logarithmic regret over time for MAB problem (proposal 1). Then, we extend this algorithm to cope with dynamic environment by applying a change point detection test based on Page–Hinkley test (PHT) (proposal 2). However, this test would entail some waste of time if the change-point detection was actually a false alarm. To face this problem, we extend our previous proposal by referring to a meta-bandit algorithm (proposal 3) to solve the dilemma between Exploration and Exploitation after the change-point detection occurs. Simulation results show that
Source:Physical Communication
Author(s): Chaima Dhahri , Tomoaki Ohtsuki