Garrett van Ryzin
Jeff McGill
Management Science
A stochastic approximation method with max-norm projections and its applications to the Q-learning algorithm