https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node26.html

【平均-打折奖励】

Schwartz [106] examined the problem of adapting Q-learning to an average-reward framework. Although his R-learning algorithm seems to exhibit convergence problems for some MDPs, several researchers have found the average-reward criterion closer to the true problem they wish to solve than a discounted criterion and therefore prefer R-learning to Q-learning [69].

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning的更多相关文章

  1. (转) Deep Learning Research Review Week 2: Reinforcement Learning

      Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...

  2. (转) Using the latest advancements in AI to predict stock market movements

    Using the latest advancements in AI to predict stock market movements 2019-01-13 21:31:18 This blog ...

  3. 强化学习(Reinfment Learning) 简介

    本文内容来自以下两个链接: https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/ https: ...

  4. (zhuan) Deep Reinforcement Learning Papers

    Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...

  5. (转) Deep Learning in a Nutshell: Reinforcement Learning

    Deep Learning in a Nutshell: Reinforcement Learning   Share: Posted on September 8, 2016by Tim Dettm ...

  6. (转) Deep Reinforcement Learning: Pong from Pixels

    Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...

  7. Deep Reinforcement Learning: Pong from Pixels

    这是一篇迟来很久的关于增强学习(Reinforcement Learning, RL)博文.增强学习最近非常火!你一定有所了解,现在的计算机能不但能够被全自动地训练去玩儿ATARI(译注:一种游戏机) ...

  8. [DQN] What is Deep Reinforcement Learning

    已经成为DL中专门的一派,高大上的样子 Intro: MIT 6.S191 Lecture 6: Deep Reinforcement Learning Course: CS 294: Deep Re ...

  9. Deep Reinforcement Learning 基础知识

    Introduction 深度增强学习Deep Reinforcement Learning是将深度学习与增强学习结合起来从而实现从Perception感知到Action动作的端对端学习的一种全新的算 ...

随机推荐

  1. Hadoop OutputFormat浅析

    问题:reduce输出时,如果不是推测任务写结果时会先写临时目录最后移动到输出目录吗? 下面部分转自Hadoop官网说明 OutputFormat 描述Map/Reduce作业的输出样式. Map/R ...

  2. codevs——1044 拦截导弹(序列DP)

    1999年NOIP全国联赛提高组  时间限制: 1 s  空间限制: 128000 KB  题目等级 : 黄金 Gold 题解       题目描述 Description 某国为了防御敌国的导弹袭击 ...

  3. 如何让你的网页加载时间降低到 1s 内

    当初分析了定宽高值和定宽高比这两种常见的图片延迟加载场景,也介绍了他们的应对方案,还做了一点技术选型的工作. 经过一段时间的项目实践,在先前方案的基础上又做了很多深入的优化工作.最终将好奇心日报的网页 ...

  4. hdu2102 BFS

    这是一道BFS的搜索题目,只是搜索范围变为了三维.定义数组visit[x][y][z]来标记空间位置,x表示楼层,y和z表示相应楼层的平面坐标. #define _CRT_SECURE_NO_DEPR ...

  5. C#面试基础题1

    1.简述 private. protected. public. internal 修饰符的访问权限.(C++中没有internal) private : 私有成员, 在类的内部才可以访问 ,也就是类 ...

  6. 修改Linux基本配置

    1.修改主机名 vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=server1.cn 2.修改ip地址 vi /etc/sysconfig/netw ...

  7. Java内存区域与模拟内存区域异常

    我把Java的内存区域画了一张思维导图,以及各区域的主要功能. 模拟Java堆溢出 Java堆用于存储对象实例.仅仅要不断地创建对象而且保证GC ROOTS到对象之间有可达路径避免被回收机制清除.就能 ...

  8. Linux内核——内存管理

    内存管理 页 内核把物理页作为内存管理的基本单位.内存管理单元(MMU,管理内存并把虚拟地址转换为物理地址)通常以页为单位进行处理.MMU以页大小为单位来管理系统中的页表. 从虚拟内存的角度看,页就是 ...

  9. Leetcode Array 11 Container With Most Water

    题目: Given n non-negative integers a1, a2, ..., an, where each represents a point at coordinate (i, a ...

  10. oracle 表压缩技术

    压缩表是我们维护管理中常常会用到的.以下我们看都oracle给我们提供了哪些压缩方式. 文章摘自"Oracle® Database Administrator's Guide11g Rele ...