To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node26.html

【平均-打折奖励】

Schwartz [106] examined the problem of adapting Q-learning to an average-reward framework. Although his R-learning algorithm seems to exhibit convergence problems for some MDPs, several researchers have found the average-reward criterion closer to the true problem they wish to solve than a discounted criterion and therefore prefer R-learning to Q-learning [69].

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning的更多相关文章

(转) Deep Learning Research Review Week 2: Reinforcement Learning
Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...
(转) Using the latest advancements in AI to predict stock market movements
Using the latest advancements in AI to predict stock market movements 2019-01-13 21:31:18 This blog ...
强化学习(Reinfment Learning) 简介
本文内容来自以下两个链接: https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/ https: ...
(zhuan) Deep Reinforcement Learning Papers
Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...
(转) Deep Learning in a Nutshell: Reinforcement Learning
Deep Learning in a Nutshell: Reinforcement Learning Share: Posted on September 8, 2016by Tim Dettm ...
(转) Deep Reinforcement Learning: Pong from Pixels
Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...
Deep Reinforcement Learning: Pong from Pixels
这是一篇迟来很久的关于增强学习(Reinforcement Learning, RL)博文.增强学习最近非常火!你一定有所了解,现在的计算机能不但能够被全自动地训练去玩儿ATARI(译注:一种游戏机) ...
[DQN] What is Deep Reinforcement Learning
已经成为DL中专门的一派,高大上的样子 Intro: MIT 6.S191 Lecture 6: Deep Reinforcement Learning Course: CS 294: Deep Re ...
Deep Reinforcement Learning 基础知识
Introduction 深度增强学习Deep Reinforcement Learning是将深度学习与增强学习结合起来从而实现从Perception感知到Action动作的端对端学习的一种全新的算 ...

随机推荐

页面css代码
博主原来的页面css代码 (这个是原来的那种效果,差不多弄出来会是这种效果http://www.cnblogs.com/thmyl/) /*simplememory*/ #google_ad_c1, ...
JS版汉字与拼音互转终极方案，附简单的JS拼音
前言网上关于JS实现汉字和拼音互转的文章很多,但是比较杂乱,都是互相抄来抄去,而且有的不支持多音字,有的不支持声调,有的字典文件太大,还比如有时候我仅仅是需要获取汉字拼音首字母却要引入200kb的字 ...
记录下我的阿里云centos服务器之路
以下内容都已经过试验,边走边记,懒得排版安装aphach yum install -y httpd systemctl start httpd netstat -tulp 安装桌面尽量不用桌 ...
sublime去除空白行和重复行
去除空白行 edit -> line -> delete blank lines 去除重复行打开正则模式 1 edit-> sort lines 2 command+option+ ...
linux命令lsattr、chattr、man
1.man命令,可以查看手册配置位置/etc/man.conf MANPATH决定手册查询位置 MANSECT决定man查询的顺序 man的查询 linux man的常用用法: man sectio ...
linux下crontab使用笔记
1. 安装 service crond status yum install vixie-cron yum install crontabs 2. 实例每分钟 ...
java Excel导入、自适应版本、将Excel转成List<map>对象
转载:http://blog.csdn.net/u012662357/article/details/58593020 最近在web开发中遇到excel批量导入,在网上搜了下很少有将excel直接转成 ...
HDU 4355 Party All the Time（三分|二分）
题意:n个人,都要去參加活动,每一个人都有所在位置xi和Wi,每一个人没走S km,就会产生S^3*Wi的"不舒适度",求在何位置举办活动才干使全部人的"不舒适度&quo ...
Cloudera
官方文档: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/ 博客教程 http://www.wangyon ...
typedef 与 define 的区别
1.区别 (1)定义.执行时间.作用域定义.执行时间: #define pchar char * typedef char *pchar; 定义的格式差别,显而易见的,要注意,define 是不能存 ...

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning的更多相关文章

随机推荐

热门专题