Awesome Reinforcement Learning

A curated list of resources dedicated to reinforcement learning.

We have pages for other topics: awesome-rnn, awesome-deep-vision, awesome-random-forest

Maintainers: Hyunsoo Kim, Jiwon Kim

We are looking for more contributors and maintainers!

Contributing

Please feel free to pull requests

Codes

Codes for examples and exercises in Richard Sutton and Andrew Barto's Reinforcement Learning: An Introduction
Simulation code for Reinforcement Learning Control ProblemsMATLAB Environment and GUI for Reinforcement Learning
- Pole-Cart Problem
- Q-learning Controller
Reinforcement Learning Repository - University of Massachusetts, Amherst
Brown-UMBC Reinforcement Learning and Planning Library (Java)
Reinforcement Learning in R (MDP, Value Iteration)
Reinforcement Learning Environment in Python and MATLAB
RL-Glue (standard interface for RL) and RL-Glue Library
PyBrain Library - Python-Based Reinforcement learning, Artificial intelligence, and Neural network
RLPy Framework - Value-Function-Based Reinforcement Learning Framework for Education and Research
Maja - Machine learning framework for problems in Reinforcement Learning in python
TeachingBox - Java based Reinforcement Learning framework
Policy Gradient Reinforcement Learning Toolbox for MATLAB
PIQLE - Platform Implementing Q-LEarning and other RL algorithms
BeliefBox - Bayesian reinforcement learning library and toolkit
Deep Q-Learning with Tensor Flow - A deep Q learning demonstration using Google Tensorflow

Theory

Lectures

[UCL] COMPM050/COMPGI13 Reinforcement Learning by David Silver
[UC Berkeley] CS188 Artificial Intelligence by Pieter Abbeel[Udacity (Georgia Tech.)] Machine Learning 3: Reinforcement Learning (CS7641)
[Stanford] CS229 Machine Learning - Lecture 16: Reinforcement Learning by Andrew Ng

Books

Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction [Book] [Code]
Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]
Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)] [Summary]
Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]

Surveys

Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey, JAIR, 1996. [Paper]
S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]
Matthew E. Taylor, Peter Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR, 2009. [Paper]
Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]
Michael L. Littman, "Reinforcement learning improves behaviour from evaluative feedback." Nature 521.7553 (2015): 445-451. [Paper]
Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, 2014. [Book]

Papers / Thesis

Foundational Papers
- Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper]
  - discusses issues in RL such as the "credit assignment problem"
- Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper]
  - earliest publication on temporal-difference (TD) learning rule.
Methods
- Dynamic Programming (DP):
  - Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
- Monte Carlo:
  - Andrew Barto, Michael Duff, Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper]
  - Satinder P. Singh, Richard S. Sutton, Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper]
- Temporal-Difference:
  - Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988.[Paper]
- Q-Learning (Off-policy TD algorithm):
  - Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
- Sarsa (On-policy TD algorithm):
  - G.A. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report]
  - Richard S. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper]
- R-Learning (learning of relative values)
  - Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993.[Paper-Google Scholar]
- Function Approximation methods (Least-Sqaure Temporal Difference, Least-Sqaure Policy Iteration)
  - Steven J. Bradtke, Andrew G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
  - Michail G. Lagoudakis, Ronald Parr, Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]
- Policy Search / Policy Gradient
  - Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
  - Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005. [Paper]
  - Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
  - Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper]
  - Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012.[Paper]
  - Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004.[Paper]
  - Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
  - Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
- Hierarchical RL
  - Richard Sutton, Doina Precup, Satinder Singh, Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
  - George Konidaris, Andrew Barto, Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007.[Paper]
- Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)
  - V. Mnih, et. al., Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
  - Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]
  - Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
  - Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015.[ArXiv]
  - Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]
  - Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016.[ArXiv]

Applications

Game Playing

Traditional Games
- Backgammon - "TD-Gammon" game play using TD(λ) (Tesauro, ACM 1995) [Paper]
- Chess - "KnightCap" program using TD(λ) (Baxter, arXiv 1999) [arXiv]
- Chess - Giraffe: Using deep reinforcement learning to play chess (Lai, arXiv 2015) [arXiv]
Computer Games
- Human-level Control through Deep Reinforcement Learning (Mnih, Nature 2015) [Paper] [Code] [Video]
- Flappy Bird Reinforcement Learning [Video]
- MarI/O - learning to play Mario with evolutionary reinforcement learning using artificial neural networks (Stanley, Evolutionary Computation 2002) [Paper][Video]

Robotics

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]

Control

An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper] [Video]
Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2011) [Paper]

Operations Research

Scaling Average-reward Reinforcement Learning for Product Delivery (Proper, AAAI 2004) [Paper]
Cross Channel Optimized Marketing by Reinforcement Learning (Abe, KDD 2004) [Paper]

Human Computer Interaction

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (Singh, JAIR 2002)[Paper]

Tutorials / Websites

Mance Harmon and Stephanie Harmon, Reinforcement Learning: A Tutorial
Short introduction to some Reinforcement Learning algorithms
C. Igel, M.A. Riedmiller, et al., Reinforcement Learning in a Nutshell, ESANN, 2007. [Paper]
UNSW - Reinforcement LearningROS Reinforcement Learning Tutorial
POMDP for Dummies
Scholarpedia articles on:Repository with useful MATLAB Software, presentations, and demo videos
- Reinforcement Learning
- Temporal Difference Learning
Bibliography on Reinforcement Learning
UC Berkeley - CS 294: Deep Reinforcement Learning, Fall 2015 (John Schulman, Pieter Abbeel) [Class Website]
Blog posts on Reinforcement Learning, Parts 1-4 by Travis DeWolf
The Arcade Learning Environment - Atari 2600 games environment for developing AI agents
Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy
Demystifying Deep Reinforcement Learning

Online Demos

Real-world demonstrations of Reinforcement Learning
Deep Q-Learning Demo - A deep Q learning demonstration using ConvNetJS
Deep Q-Learning with Tensor Flow - A deep Q learning demonstration using Google Tensorflow
Reinforcement Learning Demo - A reinforcement learning demo using reinforcejs by Andrej Karpathy

Awesome Reinforcement Learning的更多相关文章

Machine Learning Algorithms Study Notes(5)—Reinforcement Learning
Reinforcement Learning 对于控制决策问题的解决思路:设计一个回报函数(reward function),如果learning agent(如上面的四足机器人.象棋AI程序)在决定 ...
(转) Playing FPS games with deep reinforcement learning
Playing FPS games with deep reinforcement learning 博文转自:https://blog.acolyer.org/2016/11/23/playing- ...
(zhuan) Deep Reinforcement Learning Papers
Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...
(转) Deep Learning Research Review Week 2: Reinforcement Learning
Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...
Learning Roadmap of Deep Reinforcement Learning
1. 知乎上关于DQN入门的系列文章 1.1 DQN 从入门到放弃 DQN 从入门到放弃1 DQN与增强学习 DQN 从入门到放弃2 增强学习与MDP DQN 从入门到放弃3 价值函数与Bellman ...
Open source packages on Deep Reinforcement Learning
智能车 self driving car + 强化学习 reinforcement learning + 神经网络模拟 https://github.com/MorvanZhou/my_resear ...
(转) Deep Reinforcement Learning: Playing a Racing Game
Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...
论文笔记之：Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning ICML 2016 Best Paper 摘要:本文的贡献点主要是在 DQN ...
getting started with building a ROS simulation platform for Deep Reinforcement Learning
Apparently, this ongoing work is to make a preparation for futural research on Deep Reinforcement Le ...
(转) Deep Learning in a Nutshell: Reinforcement Learning
Deep Learning in a Nutshell: Reinforcement Learning Share: Posted on September 8, 2016by Tim Dettm ...

随机推荐

SharePoint 列表应用实例 - 显示约束
博客地址:http://blog.csdn.net/FoxDave 有时会碰到这样的需求,比如上传周报到文档库,周报只能领导和自己看到,其他同事是看不到的.通常我们开发的人遇到这种情况条件反射地想到的 ...
Activity(活动)-初讲
是一种可以包含用户界面的组件,主要用于和用户进行交互. 上一次我们的MainActivity.java 是ADT帮我们自动创建的.手动创建Activity可以加深我们的理解和记忆,于是我们先自己手动创 ...
Linux下GCC的使用
1简介 GCC 的意思也只是 GNU C Compiler 而已.经过了这么多年的发展,GCC 已经不仅仅能支持 C 语言:它现在还支持 Ada 语言.C++ 语言.Java 语言.Objective ...
《hanoi（汉诺塔）问题》求解
//Hanoi(汉诺)塔问题.这是一个古典的数学问题,用递归方法求解.问题如下: /* 古代有一个梵塔,塔内有3个座A,B,C,开始时A座上有64个盘子,盘子大小不等,大的在下,小的在上. 有一个老和 ...
loaderexceptions
前段时间遇到一个问题从容器中取数据时老报一个“无法加载一个或多个请求,请检索loaderexceptions” 真心是不晓得什么问题以前经常这么用没有问题的这个是在网站下引用了别的已经编译好的别 ...
Python inspect
inspect — Inspect live objects New in version 2.1. The inspect module provides several useful functi ...
极客DIY：如何构建一台属于自己的基站
写在前面(原文作者) 上周我去特拉维夫(Tel Aviv)探望我的朋友结果有了一些收获,一块崭新的BladeRF(x40),即一个支持USB3.0的SDR平台,这就意味着可以同时发送和接收信息了.而H ...
android死机问题
一般在平时工作中,基本上很多代码可以在eclipse+ndk进行调试,但如果需要用到具体的硬件设备,如媒体播放设备无法模拟的情况下,只能上硬件(盒子或手机)上进行调试.此时唯一的调试手段就是logca ...
【LeetCode】Best Time to Buy and Sell Stock IV
Best Time to Buy and Sell Stock IV Say you have an array for which the ith element is the price of a ...
php 安装composer
右击我的电脑再属性再高级再环境变量再系统变量里有个path 双击打开来把你的PHP路径加个分号再前面添加进去就OK了 1.http://www.th7.cn/Program/php/20 ...

Awesome Reinforcement Learning

Awesome Reinforcement Learning

Contributing

Table of Contents

Codes

Theory

Lectures

Books

Surveys

Papers / Thesis

Applications

Game Playing

Robotics

Control

Operations Research

Human Computer Interaction

Tutorials / Websites

Online Demos

Awesome Reinforcement Learning的更多相关文章

随机推荐

热门专题