Incentivizing exploration in reinforcement learning with deep predictive models

Stadie, Bradly C., Sergey Levine, and Pieter Abbeel. "Incentivizing exploration in reinforcement learning with deep predictive models." arXiv preprint arXiv:1507.00814 (2015).

作者通过模拟(状态，动作)的不确定性，从而修改reward，帮助agent进行探索。作者说用了他们的方法不用进行随机探索。该方法比较通用，适用于多种RL模型，但是要训练auto-encoder，所以也稍微有点繁琐。

实用指数：3颗星

理论指数：1颗星

创新指数：4颗星

Incentivizing exploration in reinforcement learning with deep predictive models的更多相关文章

(zhuan) Deep Reinforcement Learning Papers
Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...
深度学习国外课程资料(Deep Learning for Self-Driving Cars)+(Deep Reinforcement Learning and Control )
MIT(Deep Learning for Self-Driving Cars) CMU(Deep Reinforcement Learning and Control ) 参考网址: 1 Deep ...
18 Issues in Current Deep Reinforcement Learning from ZhiHu
深度强化学习的18个关键问题 from: https://zhuanlan.zhihu.com/p/32153603 85 人赞了该文章深度强化学习的问题在哪里?未来怎么走?哪些方面可以突破? 这两 ...
(转) Deep Learning Research Review Week 2: Reinforcement Learning
Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...
(转) Deep Reinforcement Learning: Playing a Racing Game
Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...
(转) Deep Learning in a Nutshell: Reinforcement Learning
Deep Learning in a Nutshell: Reinforcement Learning Share: Posted on September 8, 2016by Tim Dettm ...
(转) Deep Reinforcement Learning: Pong from Pixels
Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...
论文笔记之：Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning ICML 2016 深度强化学习最近被人发现貌似不太稳定,有人提出很多改善的方法,这些方法有很 ...
论文笔记之：Deep Reinforcement Learning with Double Q-learning
Deep Reinforcement Learning with Double Q-learning Google DeepMind Abstract 主流的 Q-learning 算法过高的估计在特 ...

随机推荐

[Android开发那点破事]解决android.os.NetworkOnMainThreadException
[Android开发那点破事]解决android.os.NetworkOnMainThreadException 昨天和女朋友换了手机,我的iPhone 4S 换了她得三星I9003.第一感觉就是好卡 ...
【Oracle错误集锦】：ORA-12154: TNS: 无法解析指定的连接标识符
相信这个错误大家都不陌生,仅仅要安装使用过Oracle的预计都遇到过这个问题,一般出如今用PL/SQL连接Oracle数据库的时候发生的. 导致这个错误的原因以及解决方式都是多种多样的,我也是三番五次 ...
mysql 再查询结果的基础上查询（子查询）
SELECT A.wx_name, A.wx_litpic, B . * FROM ( SELECT uid, COUNT( * ) AS daticishu FROM statements , ) ...
jeecg多页签的选择切换
有时候我们的页面需要多页签,多页签又引起一个问题就是只会校验初始加载的页签,所以就有了一个需求,需要把所有的页签都加载一遍,之后所有页签中需要校验的内容都会校验了,切换页签代码如下: $(docume ...
μCOS-II系统之事件(event)的使用规则及Semaphore实例
**************************************************************************************************** ...
HDU 3395 Special Fish 最“大”费用最大流
求最大费用能够将边权取负以转化成求最小费用. 然而此时依旧不正确.由于会优先寻找最大流.可是答案并不一定出如今满流的时候.所以要加一些边(下图中的红边)使其在答案出现时满流. 设全部边的流量为1,花费 ...
js正则匹配中文
alert(/[\u4e00-\u9fa5]{4}/.test("司徒正美"))//true alert(/[\u4e00-\u9fa5]{4}/.test("司正美&q ...
Solr学习之四-Solr配置说明之二
上一篇的配置说明主要是说明solrconfig.xml配置中的查询部分配置,在solr的功能中另外一个重要的功能是建索引,这是提供快速查询的核心. 按照Solr学习之一所述关于搜索引擎的原理中说明了建 ...
MVC，MVP 和 MVVM 的图示转自阮一峰先生网络日志
MVC,MVP 和 MVVM 的图示作者: 阮一峰复杂的软件必须有清晰合理的架构,否则无法开发和维护. MVC(Model-View-Controller)是最常见的软件架构之一,业界有着广泛 ...
LeetCode: Evaluate Reverse Polish Notation 解题报告
Evaluate Reverse Polish Notation Evaluate the value of an arithmetic expression in Reverse Polish No ...

Incentivizing exploration in reinforcement learning with deep predictive models

Incentivizing exploration in reinforcement learning with deep predictive models的更多相关文章

随机推荐

热门专题