免费一看一级欧美-免费一区二区三区免费视频-免费伊人-免费影片-99精品网-99精品小视频

課程目錄: 基于樣本的學習方法培訓
4401 人關注
(78637/99817)
課程大綱:

    基于樣本的學習方法培訓

 

 

 

Welcome to the Course!
Welcome to the second course in the Reinforcement Learning Specialization:
Sample-Based Learning Methods, brought to you by the University of Alberta,
Onlea, and Coursera.
In this pre-course module, you'll be introduced to your instructors,
and get a flavour of what the course has in store for you.
Make sure to introduce yourself to your classmates in the "Meet and Greet" section!
Monte Carlo Methods for Prediction & Control
This week you will learn how to estimate value functions and optimal policies,
using only sampled experience from the environment.
This module represents our first step toward incremental learning methods
that learn from the agent’s own interaction with the world,
rather than a model of the world.
You will learn about on-policy and off-policy methods for prediction
and control, using Monte Carlo methods---methods that use sampled returns.
You will also be reintroduced to the exploration problem,
but more generally in RL, beyond bandits.
Temporal Difference Learning Methods for Prediction
This week, you will learn about one of the most fundamental concepts in reinforcement learning:
temporal difference (TD) learning.
TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods.
TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world,
and do not require knowledge of the model.
TD methods are similar to DP methods in that they bootstrap,
and thus can learn online---no waiting until the end of an episode.
You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping.
For this module, we first focus on TD for prediction, and discuss TD for control in the next module.
This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.
Temporal Difference Learning Methods for ControlThis week,
you will learn about using temporal difference learning for control,
as a generalized policy iteration strategy.
You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa,
Q-learning and Expected Sarsa. You will see some of the differences between
the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both.
You will implement Expected Sarsa and Q-learning, on Cliff World.
Planning, Learning & ActingUp until now,
you might think that learning with and without a model are two distinct,
and in some ways, competing strategies: planning with
Dynamic Programming verses sample-based learning via TD methods.
This week we unify these two strategies with the Dyna architecture.
You will learn how to estimate the model from data and then use this model
to generate hypothetical experience (a bit like dreaming)
to dramatically improve sample efficiency compared to sample-based methods like Q-learning.
In addition, you will learn how to design learning systems that are robust to inaccurate models.

主站蜘蛛池模板: 日韩黄色在线 | 亚洲久草| 欧美一区二区三区高清视频 | 欧美大片在线 | 日本精品一区二区在线播放 | 亚洲高清视频在线观看 | 在线港台四级 | 成品人app软件大全下载免费版 | 性欧美大战久久久久久久野外黑人 | 91精品国产免费入口 | 免费在线观看的毛片 | 精品卡一卡卡2卡3网站 | 日本女人毛茸茸 | 四虎影院在线播放 | 亚洲国产精品自产在线播放 | 亚洲欧美精品 | 黄色的视频在线免费观看 | 97国产在线视频公开免费 | 日本视频不卡 | 亚洲国产日本 | 91色在线播放 | 高清一级毛片一本到免费观看 | 久久这里只有精品视频99 | 欧美在线视频免费观看 | 在线毛片免费 | 日韩在线视频不卡 | 欧美精品在线免费 | 男女羞羞羞视频午夜视频 | 色一情一区二区三区四区 | 一级成人a免费视频 | 毛片一级黄色 | 亚洲欧美一区二区三区在线 | 国产三级观看久久 | 香蕉视频一区二区三区 | 男人与女人交配 | 国内精品区一区二区三 | 一区二区三区四区 | 在线观看亚洲免费视频 | 99re热 | 久久综合久久网 | 亚洲第一成网站 |