Pieter Abbeel & Andrew Y. Ng (2004):
Apprenticeship learning via inverse reinforcement learning.
In: International Conference on Machine Learning,
pp. 1–,
doi:10.1145/1015330.1015430.
Umut A. Acar (2005):
Self-adjusting computation.
Carnegie Mellon University,
Pittsburgh, PA, USA.
David Andre & Stuart Russell (2001):
Programmable Reinforcement Learning Agents.
In: Advances in Neural Information Processing Systems,
pp. 1019–1024.
David Andre & Stuart Russell (2002):
State Abstraction for Programmable Reinforcement Learning Agents.
In: Eighteenth National Conference on Artificial Intelligence,
pp. 119–125.
Peter Auer, Nicolò Cesa-Bianchi & Paul Fischer (2002):
Finite-time Analysis of the Multiarmed Bandit Problem.
Machine Learning 27,
pp. 235–256,
doi:10.1023/A:1013689704352.
Tim Bauer, Martin Erwig, Alan Fern & Jervis Pinto (2011):
Adaptation-Based Programming in Java.
PEPM '11,
pp. 81–90,
doi:10.1145/1929501.1929518.
Bauer, Tim and Erwig, Martin and Fern, Alan and Pinto, Jervis:
ABP.
http://web.engr.oregonstate.edu/ bauertim/abp/.
Christopher Bishop (2006):
Pattern Recognition and Machine Learning.
Springer.
Thomas Dietterich (1998):
The MAXQ Method for Hierarchical Reinforcement Learning.
In: International Conference on Machine Learning,
pp. 118–126.
Michail Lagoudakis & Michael Littman (2000):
Algorithm Selection using Reinforcement Learning.
In: International Conference on Machine Learning,
pp. 511–518.
T. Lai & H. Robbins (1985):
Asymptotically efficient adaptive allocation rules.
Advances in Applied Mathematics 6,
pp. 4–22,
doi:10.1109/TAC.1987.1104491.
K. Levenberg (1944):
A method for the solution of certain non-linear problems in least squares.
Applied Math Quarterly,
pp. 164–168.
Michael Littman (1994):
Markov Games as a Framework for Multi-Agent Reinforcement Learning.
In: International Conference on Machine Learning,
pp. 157–163.
R. Maclin, J. Shavlik, L. Torrey, T. Walker & E. Wild (2005):
Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression.
In: Proceedings of the Twentieth National Conference on Artificial Intelligence,
pp. 819–824.
D. Marquardt (1963):
An algorithm for least-squares estimation of nonlinear parameters.
SIAM Journal of Applied Mathematics.
H. B. Nielsen (2000):
UCTP - Test Problems for Unconstrained Optimization.
Technical Report.
Technical University of Denmark.
H. Robbins (1952):
Some Aspects of the Sequential Design of Experiments.
Bulletin of the American Mathematical Society 58,
pp. 527–535,
doi:10.1090/S0002-9904-1952-09620-8.
Paul Ruvolo, Ian R. Fasel & Javier R. Movellan (2008):
Optimization on a Budget: A Reinforcement Learning Approach.
In: Neural Information Processing Symposium (NIPS),
pp. 1385–1392.
T. Schrijvers, S. Peyton-Jones & M. Chakravarty (2008):
Type Checking with Open Type Functions.
In: ACM Int. Conf. on Functional Programming,
pp. 51–62,
doi:10.1145/1411203.1411215.
Christopher Simpkins, Sooraj Bhat, Michael Mateas & Charles Isbell (2008):
Toward Adaptive Programming: Integrating Reinforcement Learning into a Programming Language.
In: ACM Conference on Object-Oriented Programming Systems, Languages and Applications,
pp. 603–614,
doi:10.1145/1449955.1449811.
Richard Sutton & Andrew Barto (2000):
Reinforcement Learning: An Introduction.
MIT Press.
S. Thompson (1991):
Type Theory and Functional Programming.
Addison-Wesley,
Redwood City, CA, USA.