A stochastic approximation method with max-norm projections and its applications to the Q-learning algorithm
Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm
ACM Transactions on Modeling and Computer Simulation
INFORMS Journal on Computing
Huseyin Topaloglu