![]() |
CiteULike | ![]() |
bsilverthorn's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Using upper confidence bounds for online learningby: Peter Auer
Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on In 41st Annual Symposium on Foundations of Computer Science (FOCS 2000) (2000), pp. 270-279.
|
Reviews
[Write a review of this article]
Notes for this article
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractWe show how a standard tool from statistics, namely confidence bounds, can be used to elegantly deal with situations which exhibit an exploitation/exploration trade-off. Our technique for designing and analyzing algorithms for such situations is very general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We consider two models with such an exploitation/exploration trade-off. For the adversarial bandit problem our new algorithm suffers only O˜(T<sup>1/2</sup>) regret over T trials which improves significantly over the previously best O˜(T<sup>2/3</sup>) regret. We also extend our results for the adversarial bandit problem to shifting bandits. The second model we consider is associative reinforcement learning with linear value functions. For this model our technique improves the regret from O˜(T<sup>3/4</sup>) to O˜(T<sup>1/2</sup>)
BibTeX record
RIS record