|
|
| All downloadable documents are Adobe Acrobat PDF documents. You can obtain Acrobat for free by following the link from the Adobe Icon. |
Note: This syllabus will be modified continuously to accommodate the progress and interests of the course participants!
Date | Topic | Handouts |
Sept. 3 | Introduction to Reinforcement Learning | Slides, Sutton Book Chapters 1-5 |
Sept. 10 | Function Approximation in Reinforcement Learning, Optimal control along trajectories: LQR, LQG and DDP | Sutton Book Chapter 8, Todorov2005 |
Sept. 17 | Research on DDP and Function Approximation for RL | Tassa2007, Slides |
Sept. 24 | Research on DDP and Function Approximation in RL | Doya2000, Morimoto2003 |
Oct., 1 | Gaussian Processes for Reinforcement Learning, Value function learning along trajectories (fitted Q iteration), Least Squares Temporal Difference Methods | Deisenroth2009, Lagoudakis2002, Ernst2005 |
Oct.. 8 | Policy Gradient Methods: REINFORCE, GPOMDP, Natural Gradients | Williams1992, Sutton2000, Peters2008, Slides |
Oct.. 15 | Research on Policy Gradient Methods, Introduction to Path Integral Methods | Tedrake2005, Bagnell2003 |
Oct. 22 | Path Integral Methods for Reinforcement Learning | Theodorou2010, Todorov2009, Kober2009 |
Oct. 29 | Path Integral Methods for Reinforcement Learning (continued) | Slides |
Nov. 5 | Sketch of Planned Projects, Modular Learning Control | Tedrake2009, Todorov2009 |
Nov. 12 | Inverse reinforcement learning | Dvijotham2009, Abbeel2009, Ratliff2009 |
Nov. 19 | Dynamic Bayesian networks for reinforcement learning | Toussaint2006, Vlassis2009 |
Dec. 3 | Project presentations. | |
Tentative Syllabus:
- Introduction to reinforcement learning [1]
- Dynamic programming methods [1, 2]
- Optimal control methods [2, 3]
- Temporal difference methods [1]
- Q-Learning [1]
- Problems of value-function-based RL methods
- Function Approximation for RL [1]
- Incremental Function Approximation Methods for RL [4, 5]
- Least Squares Methods [6]
- Direct Policy Learning: REINFORCE [7]
- Modern policy gradient methods: GPOMDP and the Policy Gradient Theo-rem [8, 9]
- Natural Policy Gradient Methods [9]
- Prob. Reinforcement Learning with Reward Weighted Averaging [10, 11]
- Q-Learning on Trajectories [12]
- Path Integral Approaches to Reinforcement Learning I [13]
- Path Integral Approaches to Reinforcement Learning II
- Dynamic Bayesian Networks for RL [14]
- Gaussian Processes in Reinforcement Learning [5]
Readings:
- (:titlesearch Reinforcement learning : An introduction :)
- (:titlesearch The computation and theory of optimal control :)
- (:titlesearch Differential dynamic programming :)
- (:titlesearch Constructive incremental learning from only local information :)
- (:titlesearch Gaussian processes for machine learning :)
- J. Boyan, "Least-squares temporal difference learning," in In Proceedings of the Sixteenth International Conference on Machine Learning: Morgan Kaufmann, 1999, pp. 49-56.
- (:titlesearch Simple statistical gradient-following algorithms for connectionist reinforcement learning :)
- (:titlesearch Reinforcement learning of motor skills with policy gradients :)
- (:titlesearch Natural actor critic :)
- (:titlesearch Reinforcement learning by reward-weighted regression for operational space control :)
- (:titlesearch Policy Search for Motor Primitives in Robotics :)
- (:titlesearch Fitted Q-iteration by advantage weighted regression :)
- (:titlesearch Path integral stochastic optimal control for rigid body dynamics :)
- (:titlesearch Probabilistic inference for solving discrete and continuous state Markov Decision Processes :)
|