How rival bots battled their way to poker supremacy

http://www.nature.com/news/how-rival-bots-battled-their-way-to-poker-supremacy-1.21580

Artificial-intelligence programs harness game-theory strategies and deep learning to defeat human professionals in two-player hold 'em.

Elizabeth Gibney

02 March 2017

Article tools

Juice/Alamy

Top professional poker players have been beaten by AI bots at no-limits hold' em.

A complex variant of poker is the latest game to be mastered by artificial intelligence (AI). And it has been conquered not once, but twice, by two rival bots developed by separate research teams.

Both algorithms plays a ‘no limits’ two-player version of Texas Hold 'Em. And each has in recent months hit a crucial AI milestone: they have beaten human professional players.

The game first fell in December to DeepStack, developed by computer scientists at the University of Alberta in Edmonton, Canada, with collaborators from Charles University and the Czech Technical University in Prague. A month later, Libratus, developed by a team at Carnegie Mellon University (CMU) in Pittsburgh, Pennsylvania, achieved the feat.

Over the past decade the groups have pushed each other to make ever better bots, and now the team behind DeepStack has formally published details of its AI in Science¹. But the bots are yet to play each other.

Nature looks at how the two AIs stack up, what the accomplishments could mean for online casinos and what’s left for AI to conquer.

Why do AI researchers care about poker?

AIs have mastered several board games, including chess and the complex-strategy game Go. But poker has a key difference from board games that adds complexity: players must work out their strategies without being able to see all of the information about the game on the table. They must consider what cards their opponents might have and what the opponents might guess about their hand based on previous betting.

Game theorists crack poker

Games that have such ‘imperfect information’ mirror real-life problem-solving scenarios, such as auctions and financial negotiations, and poker has become an AI test bed for these situations.

Algorithms have already cracked simpler forms of poker: the Alberta team essentially solved a limited version of two-player hold ’em poker in 2015. The form played by DeepStack and Libratus is still a two-player game, but there are no limits on how much an individual player can bet or raise — which makes it considerably more complex for an AI to navigate.

How did the human-versus-AI games unfold?

Over 4 weeks beginning in November last year, DeepStack beat 10 of 11 professional players by a statistically significant margin, playing 3,000 hands against each.

Game theory. Both AIs aim to find a strategy that is guaranteed not to lose, regardless of how an opponent plays. And because one-on-one poker is a zero-sum game — meaning that one player’s loss is always the opponent’s gain — game theory says that such a strategy always exists. Whereas a human player might exploit a weak opponent’s errors to win big, an AI with this strategy isn’t concerned by margins — it plays only to win. That means it also won't be thrown by surprising behaviour.

Previous poker-playing algorithms have generally tried to work out strategies ahead of time, computing massive ‘game trees’ that outline solutions for all the different ways that a game could unfold. But the number of possibilities is so huge — 10¹⁶⁰ — that mapping all of them is impossible. So researchers settled for solving fewer possibilities. In a game, an algorithm compares a live situation to those that it has previously calculated. It finds the closest one and ‘translates’ the corresponding action to the table.

Now, however, both DeepStack and Libratus have found ways to compute solutions in real time — as is done by computers that play chess and Go.

How do the approaches of the two AIs compare?

Instead of trying to work out the whole game tree ahead of time, DeepStack recalculates only a short tree of possibilities at each point in a game.

The developers created this approach using deep learning, a technique that uses brain-inspired architectures known as neural networks (and that helped a computer to beat one of the world’s best players at Go).

Computer science: The learning machines

By playing itself in more than 11 million game situations, and learning from each one, DeepStack gained an ‘intuition’ about the likelihood of winning from a given point in the game. This allows it to calculate fewer possibilities in a relatively short time — about 5 seconds — and make real-time decisions.

The Libratus team has yet to publish its method, so it’s not as clear how the program works. What we do know is that early in a hand, it uses previously calculated possibilities and the ‘translation’ approach, although it refines that strategy as the game gives up more information. But for the rest of each hand, as the possible outcomes narrow, the algorithm also computes solutions in real time.

And Libratus also has a learning element. Its developers added a self-improvement module, which automatically analyses the bot's playing strategy to learn how an opponent had exploited its weaknesses. They then use the information to permanently patch up holes in the AI’s approach.

The two methods require substantially different computing power: DeepStack trained using 175 core years — the equivalent of running a processing unit for 175 years or a few hundred computers for a few months. And during games it can run off a single laptop. Libratus, by contrast, uses a supercomputer before and during the match, and the equivalent of around 2,900 core years.

Can they bluff?

Yes. People often see bluffing as something human, but to a computer, it has nothing to do with reading an opponent, and everything to do with the mathematics of the game. Bluffing is merely a strategy to ensure that a player’s betting pattern never reveals to an opponent the cards that they have.

OK, so which result was more impressive?

It depends on whom you ask. Experts could quibble over the intricacies of both methods, but overall, both AIs played enough hands to generate statistically significant wins — and both against professional players.

Google AI algorithm masters ancient game of Go

Libratus played more hands, but DeepStack didn’t need to, because its team used a sophisticated statistical method that enabled them to prove a significant result from fewer games. Libratus beat much better professionals than did DeepStack, but on average, DeepStack won by a bigger margin.

Will the two AIs now face off?

Maybe. A sticking point is likely to be the big difference in computing power and so the speed of play between the AIs. This could make it difficult to find rules to which both sides can agree.

University of Alberta computer scientist Michael Bowling, one of the developers of DeepStack, says that his team is up for playing Libratus. But Libratus developer Tuomas Sandholm at CMU says that he first wants to see DeepStack beat Baby Tartanian8, one of his team’s earlier and weaker AIs.

Bowling stresses that the match would carry a big caveat: the winner might not be the better bot. Both are trying to play the perfect game, but the strategy closest to that ideal doesn’t always come out in head-to-head play. One program could accidentally hit on a hole in the opponent’s strategy, but that wouldn’t necessarily mean that the strategy overall has more or bigger holes. Unless one team wins by a substantial margin, says Bowling, “my feeling is it won’t be as informative as people would like it to be”.

Does this mean the end of online poker?

No. Many online poker casinos forbid the use of a computer to play in matches, although top players have started to train against machines.

Now that computers have slain another AI milestone, what’s left to tackle?

There are few mountains left for the AI community to climb. In part, this is because many of the games that remain unsolved, such as bridge, have more complicated rules, and so have made for less obvious targets.

The natural next move for both teams is to tackle multiplayer poker. This could mean almost starting from scratch because zero-sum game theory does not apply: in three-player poker, for instance, a bad move by one opponent can indirectly hinder, rather than always advantage, another player.

What Google’s winning Go algorithm will do next

But the intuition of deep learning could help to find solutions even where the theory doesn’t apply, says Bowling. His team’s first attempts to apply similar methods in the three-player version of limited Texas Hold ’Em have turned out surprisingly well, he says.

Another challenge is training an AI to play games without being told the rules, and instead discovering them as it goes along. This scenario more realistically mirrors real-world problem-solving situations that humans face.

The ultimate test will be to explore how much imperfect-information algorithms really can help to tackle messy real-world problems with incomplete information, such as in finance and cybersecurity.

Nature

543,

160–161

(09 March 2017)

doi:10.1038/nature.2017.21580

References

Moravčík, M. et al. Science http://dx.doi.org/10.1126/science.aam6960 (2017).

Show context

[转]How rival bots battled their way to poker supremacy的更多相关文章

RPCL(Rival Penalized Competitive Learning)在matlab下的实现
RPCL是在线kmeans的改进版,相当于半自动的选取出k个簇心:一开始设定N个簇心,N>k,然后聚类.每次迭代中把第二近的簇心甩开一段距离. 所谓在线kmeans是就是,每次仅仅用一个样本来更 ...
Bots(逆元，递推）
H. Bots time limit per test 1.5 seconds memory limit per test 256 megabytes input standard input out ...
Codeforces Bubble Cup 8 - Finals [Online Mirror]H. Bots 数学
H. Bots Time Limit: 1 Sec Memory Limit: 256 MB 题目连接 http://codeforces.com/contest/575/problem/H Desc ...
Educational Codeforces Round 76 (Rated for Div. 2) A. Two Rival Students 水题
A. Two Rival Students There are
Bubble Cup 8 finals H. Bots (575H)
题意: 简单来说就是生成一棵树,要求根到每个叶子节点上的路径颜色排列不同, 且每条根到叶子的路径恰有n条蓝边和n条红边. 求生成的树的节点个数. 1<=n<=10^6 题解: 简单计数. ...
Py中map与np.rival学习
转自:廖雪峰网站 1.map/reduce map()函数接收两个参数,一个是函数,一个是Iterable,map将传入的函数依次作用到序列的每个元素,并把结果作为新的Iterator返回. 举例说明 ...
A.Two Rival Students
题目:两个竞争的学生链接:(两个竞争的对手)[https://codeforces.com/contest/1257/problem/A] 题意:有n个学生排成一行.其中有两个竞争的学生.第一个学生 ...
【CF1257A】Two Rival Students【思维】
题意:给你n个人和两个尖子生a,b,你可以操作k次,每次操作交换相邻两个人的位置,求问操作k次以内使得ab两人距离最远是多少题解:贪心尽可能的将两人往两边移动,总结一下就是min(n-1,|a-b| ...
Educational Codeforces Round 76 (Rated for Div. 2) A. Two Rival Students
You are the gym teacher in the school. There are nn students in the row. And there are two rivalling ...

随机推荐

webpack配置路径及hash版本号，利用html-webpack-plugin自动生成html模板
在项目中,因为需要经常更新文件,但是浏览器缓存问题导致js文件不是最新的,所有想办法添加hash值. 并配置webpack打包文件配置路径: 配置webpack打包文件路径,及非入口 chunk文件: ...
Python3+smtplib+poplib+imaplib实现发送和收取邮件（以qq邮箱为例）
一.说明 1.1 程序说明 (1)smtp是邮件发送协议:pop和imap都是邮件接收协议,两者的区别通常的说法是imap的操作会同步到邮箱服务器而pop不会,表现上我也不是很清楚 (2)本程序实现使 ...
维护一个旧程序 linq2sql,出现row not found or changed的异常
维护一个旧程序 linq2sql,出现row not found or changed的异常, 查博客园,文章都是一大抄,都不对. 想想之前都能保存的.这个异常是在加了字段之后出现的. 因为用vs.n ...
批量生成QRcode
本想在excel批量生成GUID,并生成二维码. //Excel生成guid,uuid 格式:600d65bc-948a---fd8dfeebb1cd =LOWER(CONCATENATE(DEC2H ...
Struts 2 初步入门（六）之处理结果类型
Struts2 处理流程: 用户请求--->struts框架--->Action控制器--->struts框架--->视图资源 xml配置文件里: <result nam ...
Spring boot返回JSON类型响应及Content-Type设置
一.背景服务器软件用Spring boot开发,API调用的响应消息格式为JSON. 对端调用接口后无法解析响应. 抓包看Response的Body部分确实是正确的JSON格式字符串. 二.问题分析 ...
逆袭之旅DAY31.XIA.JDBC
2018-07-31 MySQL package oop_emp.com.neusoft.dao; import java.sql.Connection; import java.sql.Driver ...
owin启动事项
在上下文中找不到 owin.Environment 项 owin没有启动. 尝试加载应用时出现了以下错误.- 找不到包含 OwinStartupAttribute 的程序集 startup类不是通过v ...
react中用pace.js
pace.js不支持npm, 所以只能直接下载下来,当作普通js引入我在用的时候怎么都引不到组件里去用后来终于找到方法了,直接上图了 1.先将pace文件下载来放在公共js目录下,pace.les ...
【原创】<笔试题> 深圳市天软科技开发有限公司
时间:2018.03.03 上午 1.编写函数,实现字符串比较功能. 参考:http://blog.csdn.net/liubinzi123/article/details/8271683 /* * ...

[转]How rival bots battled their way to poker supremacy