(转) Reinforcement Learning for Profit
Reinforcement Learning for Profit
July 17, 2016
Is RL being used in revenue generating systems today?
Recently, one of my facebook friends, and alumni of the University of Alberta (with a PhD in Computing Science), Cosmin Paduraru posed a question:
Where is Reinforcement Learning used in revenue generating systems today?
I have been thinking about this lots over the last month as I attended two international conferences on Artificial Intelligence and Machine Learning (ICML and IJCAI) in NYC, USA. It is important to explore future prospects both inside and outside academia — In case you need a catch up, I am currently at the University of Alberta working on a PhD in Computing Science with a focus on Reinforcement Learning and Artificial Intelligence.
With the success of modern AI systems — out of the winter and into the spring — many companies have invested and continue to invested heavily into modern AI systems, backed by teams of leading researchers in the field (e.g. Facebook, Google, Microsoft, IBM, Twitter, etc.).
With that said, maybe Cosmin is right, Reinforcement Learning (Sutton and Barto 1998, and this killer-intro by the fantastically talented Andrej Karpathy) is seemingly publicly underrepresented in currently deployed systems making money in the real world, or is it?
Adapted from Sutton and Barto 1998 and WALL-E
Luckily I was at the International Joint Conference on Artificial Intelligence where I was attending a panel discussion on The Business of AI, the panel was composed of all speakers from the industry day. A desirable venue to solicit a wide variety of opinions from thought leaders in the field.
So I posed the question to them, their responses went as follows:
Peter Norvig (Director of Research at Google): “well… AlphaGo made a million bucks and then gave it away”… a recent tweet from Demis Hassabis (Google DeepMind) confirms:
Pleased to confirm the recipients of the #AlphaGo $1m prize! @UNICEF_uk, @CodeClub, and the American, European and Korean Go associations
— Demis Hassabis (@demishassabis) June 6, 2016
Peter Stone (Founder and President, Cogitai. Professor UoT (Austin)) gave lots of great examples of recent applications:
He said,“We are on the cusp of moving from the academic lab to the industry for RL, adaptation, and lifelong learning…We are at the cusp, and that is the main motivation from Cogitai”
He also referenced work by Thomas G. Dietterich on invasive species management, wildfire suppression, by Joelle Pineau on applying RL in healthcare, and by Andrew Ng and Drew Bagnall on helicopter control. All of these could be as a practical demonstrations of specific, developing industrial applications.
Hiroaki Kitano (President & CEO SONY Computer Science Laboratories) said that this is a current research area for Sony and to expect profitability using these and advancing RL algorithms in 2-5 years. Almost 10 years after Sony’s last robotic venture, the Aibo, Sony CEO Kazuo Hirai has just recently (late June 2016) said “the robots we are developing can have emotional bonds with customers, giving them joy and becoming the objects of love”.
Guruduth Banavar (Chief Science Officer, Cognitive Computing, IBM Research) predicted that this is going to happen, sooner rather than later, and his prediction was that it will happen in the domain of conversational systems, dialog systems, and understanding the larger context of conversations. He also mentioned that the illustrious Gerald Tesauro (the man behind TD-Gammon) is working on these problems. Interesting that he did not mention Watson…
Some interesting answers from industry leaders. But I was surprised that no one mentioned:recommender systems (like those on Amazon, Netflix, Yelp, and nicely formalized as an RL problem in 2005 by Shani et al.), are these systems all collaborative filtering? Surely not.
No one mentioned that Google Reinforcement Learning Architecture (here is a quick summary), which I can only imagine could be behind some of the personal recommendations and rankings that Google does behind-the-scenes on Search, YouTube, and maybe … Maps?
No one mentioned contextual bandits, sometimes called associative RL (as discussed by Li et al. 2010 for news recommendation), for serving ads and news stories. These systems are surely deployed on large-scale news sites by the publishers to maximize click-through-ratios and create a personalized experience. Microsoft recently announced Multiworld Testing Decision Service, for making context based decisions… I guess there were no Microsoft representatives on the panel to toot this horn (thanks for the catch Pardis)
With so much potentially out there, why was there no mention of these use cases for reinforcement learning? Where else could RL be hiding in the money-making wild? RL seems like an ideal candidate for systems of personalization on large-scale, sequential decision-making problems… so what am I missing?
(转) Reinforcement Learning for Profit的更多相关文章
- [转]Introduction to Learning to Trade with Reinforcement Learning
Introduction to Learning to Trade with Reinforcement Learning http://www.wildml.com/2018/02/introduc ...
- Introduction to Learning to Trade with Reinforcement Learning
http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/ The academic ...
- Machine Learning Algorithms Study Notes(5)—Reinforcement Learning
Reinforcement Learning 对于控制决策问题的解决思路:设计一个回报函数(reward function),如果learning agent(如上面的四足机器人.象棋AI程序)在决定 ...
- (转) Playing FPS games with deep reinforcement learning
Playing FPS games with deep reinforcement learning 博文转自:https://blog.acolyer.org/2016/11/23/playing- ...
- (zhuan) Deep Reinforcement Learning Papers
Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...
- (转) Deep Learning Research Review Week 2: Reinforcement Learning
Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...
- Learning Roadmap of Deep Reinforcement Learning
1. 知乎上关于DQN入门的系列文章 1.1 DQN 从入门到放弃 DQN 从入门到放弃1 DQN与增强学习 DQN 从入门到放弃2 增强学习与MDP DQN 从入门到放弃3 价值函数与Bellman ...
- Open source packages on Deep Reinforcement Learning
智能车 self driving car + 强化学习 reinforcement learning + 神经网络 模拟 https://github.com/MorvanZhou/my_resear ...
- (转) Deep Reinforcement Learning: Playing a Racing Game
Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...
随机推荐
- JVM-运行时数据区
运行时数据区示意图 ...
- 关于wait和notify的用法
通常,多线程之间需要协调工作.例如,浏览器的一个显示图片的线程displayThread想要执行显示图片的任务,必须等待下载线程 downloadThread将该图片下载完毕.如果图片还没有下载完,d ...
- mysql 批量创建表,利用存储过程
最近根据需求,需要提前创建一批日志表,以日期结尾,每天创建一张,例如XXX20160530,请参考如下: BEGIN DECLARE `sName` VARCHAR(128); DECLAR ...
- 解决:信息中插入avi格式的视频时,提示“unsupported video format”
[测试步骤]:新建信息,添加AVI格式的视频 [测试结果]:添加时弹出提示"unsupported video format" 该问题主要提现在手机彩信视频附件不支持该AVI格式的 ...
- ASP.NET之Ajax系列(三)
我们通过前两篇文章的学习,已经大致掌握了Ajax的实现方法,同时也可以对比出两种方式的优劣.但是我们还是没有搞清楚真正的ajax的实现原理,以及最原始的,未经过封装的ajax是什么样的,今天我们一起来 ...
- C语言----变量及作用域 、 指针 、 指针和数组 、 进程空间 、 字符串
1 使用程序来模拟放球.取球的问题 1.1 问题 栈是一种特殊的线性表,它的逻辑结构和线性表相同,只是其运算规则较线性表有更多的限制,故又称为运算受限的线性表. 栈的定义是限制仅在表的一端进行插入和删 ...
- 编程之美2.5:寻找最大的K个数
编程之美2.5:寻找最大的K个数 引申:寻找第k大的数: 方法一: // 选择第k大的数(通过改进快速排序来实现) public static void SelectShort(int[] array ...
- dedecms 列表页 list 判断flag给定指定样式 (本地测试有效)
{dede:list pagesize='10'} [field:array runphp='yes'] if (@me['flag']=='a') @me=' <a class="n ...
- Eclipse Java 开发平台实用技巧
前言 在使用Eclipse开发Java程序的使用,有很多实用的技巧,能大大提高开发效率. 本文将介绍一部分技巧.更多的心得还得在具体项目中慢慢掌握,熟悉. 初始设定 这些具体的设置方法这里不说,网上很 ...
- Linux常用命令整理 - imsoft.cnblogs
su 用户名 在不退出登陆的情况下,切换到另外一个人的身份如果用户名缺省,则切换到root状态会提示输入密码,密码不回显的. 在用su命令切换root用户时,使用“-”选项,这样可以将root的环境变 ...