A stochastic approximation method with max-norm projections and its applications to the Q-learning algorithm
Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm
Learning Algorithms for Separable Approximations of Discrete Stochastic Optimization Problems
An algorithm for approximating piecewise linear concave functions from sample gradients
Operations Research Letters
ACM Transactions on Modeling and Computer Simulation
Mathematics of Operations Research
INFORMS Journal on Computing
Sumit Kunnumkal
Warren B. Powell
Andrzej Ruszczy?ski
Warren Powell