References

Pieter Abbeel & Andrew Y. Ng (2004): Apprenticeship learning via inverse reinforcement learning. In: International Conference on Machine Learning, pp. 1–, doi:10.1145/1015330.1015430.
Umut A. Acar (2005): Self-adjusting computation. Carnegie Mellon University, Pittsburgh, PA, USA.
David Andre & Stuart Russell (2001): Programmable Reinforcement Learning Agents. In: Advances in Neural Information Processing Systems, pp. 1019–1024.
David Andre & Stuart Russell (2002): State Abstraction for Programmable Reinforcement Learning Agents. In: Eighteenth National Conference on Artificial Intelligence, pp. 119–125.
Peter Auer, Nicolò Cesa-Bianchi & Paul Fischer (2002): Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 27, pp. 235–256, doi:10.1023/A:1013689704352.
Tim Bauer, Martin Erwig, Alan Fern & Jervis Pinto (2011): Adaptation-Based Programming in Java. PEPM '11, pp. 81–90, doi:10.1145/1929501.1929518.
Bauer, Tim and Erwig, Martin and Fern, Alan and Pinto, Jervis: ABP. http://web.engr.oregonstate.edu/ bauertim/abp/.
Christopher Bishop (2006): Pattern Recognition and Machine Learning. Springer.
Thomas Dietterich (1998): The MAXQ Method for Hierarchical Reinforcement Learning. In: International Conference on Machine Learning, pp. 118–126.
Michail Lagoudakis & Michael Littman (2000): Algorithm Selection using Reinforcement Learning. In: International Conference on Machine Learning, pp. 511–518.
T. Lai & H. Robbins (1985): Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, pp. 4–22, doi:10.1109/TAC.1987.1104491.
K. Levenberg (1944): A method for the solution of certain non-linear problems in least squares. Applied Math Quarterly, pp. 164–168.
Michael Littman (1994): Markov Games as a Framework for Multi-Agent Reinforcement Learning. In: International Conference on Machine Learning, pp. 157–163.
R. Maclin, J. Shavlik, L. Torrey, T. Walker & E. Wild (2005): Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 819–824.
D. Marquardt (1963): An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal of Applied Mathematics.
H. B. Nielsen (2000): UCTP - Test Problems for Unconstrained Optimization. Technical Report. Technical University of Denmark.
H. Robbins (1952): Some Aspects of the Sequential Design of Experiments. Bulletin of the American Mathematical Society 58, pp. 527–535, doi:10.1090/S0002-9904-1952-09620-8.
Paul Ruvolo, Ian R. Fasel & Javier R. Movellan (2008): Optimization on a Budget: A Reinforcement Learning Approach. In: Neural Information Processing Symposium (NIPS), pp. 1385–1392.
T. Schrijvers, S. Peyton-Jones & M. Chakravarty (2008): Type Checking with Open Type Functions. In: ACM Int. Conf. on Functional Programming, pp. 51–62, doi:10.1145/1411203.1411215.
Christopher Simpkins, Sooraj Bhat, Michael Mateas & Charles Isbell (2008): Toward Adaptive Programming: Integrating Reinforcement Learning into a Programming Language. In: ACM Conference on Object-Oriented Programming Systems, Languages and Applications, pp. 603–614, doi:10.1145/1449955.1449811.
Richard Sutton & Andrew Barto (2000): Reinforcement Learning: An Introduction. MIT Press.
S. Thompson (1991): Type Theory and Functional Programming. Addison-Wesley, Redwood City, CA, USA.

Comments and questions to:

eptcs@eptcs.org

For website issues:

webmaster@eptcs.org