基于Monte Carlo方法的2048 A.I.

2048 A.I. 在 stackoverflow 上有个讨论：http://stackoverflow.com/questions/22342854/what-is-the-optimal-algorithm-for-the-game-2048

得票最高的回答是基于 Min-Max-Tree + alpha beta 剪枝，启发函数的设计很优秀。

其实也可以不用设计启发函数就写出 A.I. 的，我用的方法是围棋 A.I. 领域的经典算法——Monte Carlo 局面评估 + UCT 搜索。

算法的介绍见我几年前写的一篇博文：http://www.cnblogs.com/qswang/archive/2011/08/28/2360489.html

简而言之就两点：

通过随机游戏评估给定局面的得分；
从博弈树的父节点往下选择子节点时，综合考虑子节点的历史得分与尝试次数。

针对2048游戏，我对算法做了一个改动——把 Minx-Max-Tree 改为 Random-Max-Tree，因为增加数字是随机的，而不是理性的博弈方，所以猜想 Min-Max-Tree 容易倾向过分保守的博弈策略，而不敢追求更大的成果。

UCT搜索的代码：

Orientation UctPlayer::NextMove(const FullBoard& full_board) const {

  int mc_count = ;

  while (mc_count < kMonteCarloGameCount) {

    FullBoard current_node;

    Orientation orientation = MaxUcbMove(full_board);

    current_node.Copy(full_board);

    current_node.PlayMovingMove(orientation);

    NewProfit(&current_node, &mc_count);

  }

  return BestChild(full_board);

}

NewProfit函数用于更新该节点到某叶子节点的记录，是递归实现的：

float UctPlayer::NewProfit(board::FullBoard *node,

    int* mc_count) const {

  float result;

  HashKey hash_key = node->ZobristHash();

  auto iterator = transposition_table_.find(hash_key);

  if (iterator == transposition_table_.end()) {

    FullBoard copied_node;

    copied_node.Copy(*node);

    MonteCarloGame game(move(copied_node));

    if (!HasGameEnded(*node)) game.Run();

    result = GetProfit(game.GetFullBoard());

    ++(*mc_count);

    NodeRecord node_record(, result);

    transposition_table_.insert(make_pair(hash_key, node_record));

  } else {

    NodeRecord *node_record = &(iterator->second);

    int visited_times = node_record->VisitedTimes();

    if (HasGameEnded(*node)) {

      ++(*mc_count);

      result = node_record->AverageProfit();

    } else {

      AddingNumberRandomlyPlayer player;

      AddingNumberMove move = player.NextMove(*node);

      node->PlayAddingNumberMove(move);

      Orientation max_ucb_move = MaxUcbMove(*node);

      node->PlayMovingMove(max_ucb_move);

      result = NewProfit(node, mc_count);

      float previous_profit = node_record->AverageProfit();

      float average_profit = (previous_profit * visited_times + result) /

          (visited_times + );

      node_record->SetAverageProfit(average_profit);

    }

    node_record->SetVisitedTimes(visited_times + );

  }

  return result;

}

起初用结局的最大数字作为得分，后来发现当跑到512后，Monte Carlo棋局的结果并不会出现更大的数字，各个节点变得没有区别。于是作了改进，把移动次数作为得分，大为改善。

整个程序的设计分为 board、player、game 三大模块，board 负责棋盘逻辑，player 负责移动或增加数字的逻辑，game把board和player连起来。

Game类的声明如下：

class Game {

public:

  typedef std::unique_ptr<player::AddingNumberPlayer>

  AddingNumberPlayerUniquePtr;

  typedef std::unique_ptr<player::MovingPlayer> MovingPlayerUniquePtr;

  Game(Game &&game) = default;

  virtual ~Game();

  const board::FullBoard& GetFullBoard() const {

    return full_board_;

  }

  void Run();

protected:

  Game(board::FullBoard &&full_board,

      AddingNumberPlayerUniquePtr &&adding_number_player,

      MovingPlayerUniquePtr &&moving_player);

  virtual void BeforeAddNumber() const {

  }

  virtual void BeforeMove() const {

  }

private:

  board::FullBoard full_board_;

  AddingNumberPlayerUniquePtr adding_number_player_unique_ptr_;

  MovingPlayerUniquePtr moving_player_unique_ptr_;

  DISALLOW_COPY_AND_ASSIGN(Game);

};

Run函数的实现：

void Game::Run() {

  while (!HasGameEnded(full_board_)) {

    if (full_board_.LastForce() == Force::kMoving) {

      BeforeAddNumber();

      AddingNumberMove

      move = adding_number_player_unique_ptr_->NextMove(full_board_);

      full_board_.PlayAddingNumberMove(move);

    } else {

      BeforeMove();

      Orientation orientation =

          moving_player_unique_ptr_->NextMove(full_board_);

      full_board_.PlayMovingMove(orientation);

    }

  }

}

这样就可以通过继承 Game 类，实现不同的构造函数，组合出不同的 Game，比如 MonteCarloGame 的构造函数：

MonteCarloGame::MonteCarloGame(FullBoard &&full_board) :

    Game(move(full_board),

    std::move(Game::AddingNumberPlayerUniquePtr(

    new AddingNumberRandomlyPlayer)),

    std::move(Game::MovingPlayerUniquePtr(new MovingRandomlyPlayer))) {}

一个新的2048棋局，会先放上两个数字，新棋局应该能方便地build。默认应该随机地增加两个数字，builder 类可以这么写：

template<class G>

class NewGameBuilder {

public:

  NewGameBuilder();

  ~NewGameBuilder() = default;

  NewGameBuilder& SetLastForce(board::Force last_force);

  NewGameBuilder& SetAddingNumberPlayer(game::Game::AddingNumberPlayerUniquePtr

      &&initialization_player);

  G Build() const;

private:

  game::Game::AddingNumberPlayerUniquePtr initialization_player_;

};

template<class G>

NewGameBuilder<G>::NewGameBuilder() :

    initialization_player_(game::Game::AddingNumberPlayerUniquePtr(

    new player::AddingNumberRandomlyPlayer)) {

}

template<class G>

NewGameBuilder<G>& NewGameBuilder<G>::SetAddingNumberPlayer(

    game::Game::AddingNumberPlayerUniquePtr &&initialization_player) {

  initialization_player_ = std::move(initialization_player);

  return *this;

}

template<class G>

G NewGameBuilder<G>::Build() const {

  board::FullBoard full_board;

  for (int i = ; i < ; ++i) {

    board::AddingNumberMove move = initialization_player_->NextMove(full_board);

    full_board.PlayAddingNumberMove(move);

  }

  return G(std::move(full_board));

}

很久以前，高效的 C++ 代码不提倡在函数中 return 静态分配内存的对象，现在有了右值引用就方便多了。

main 函数：

int main() {

  InitLogConfig();

  AutoGame game = NewGameBuilder<AutoGame>().Build();

  game.Run();

}

./fool2048：

这个A.I.的移动不像基于人为设置启发函数的A.I.那么有规则，不会把最大的数字固定在角落，但最后也能有相对不错的结果，游戏过程更具观赏性~

项目地址：https://github.com/chncwang/fool2048

最后发个招聘链接：http://www.kujiale.com/about/join

我这块的工作主要是站内搜索、推荐算法等，欢迎牛人投简历到hr邮箱~

基于Monte Carlo方法的2048 A.I.的更多相关文章

蒙特卡罗(Monte Carlo)方法简介
蒙特卡罗(Monte Carlo)方法,也称为计算机随机模拟方法,是一种基于"随机数"的计算方法. 二解决问题的基本思路 Monte Carlo方法的基本思想很早以前就被人们所发 ...
Monte Carlo方法简介(转载)
Monte Carlo方法简介(转载) 今天向大家介绍一下我现在主要做的这个东东. Monte Carlo方法又称为随机抽样技巧或统计实验方法,属于计算数学的一个分支,它是在上世纪四十年代 ...
利用蒙特卡洛(Monte Carlo)方法计算π值[ 转载]
部分转载自:https://blog.csdn.net/daniel960601/article/details/79121055 圆周率π是一个无理数,没有任何一个精确公式能够计算π值,π的计算只能 ...
蒙特卡罗方法、蒙特卡洛树搜索（Monte Carlo Tree Search，MCTS）初探
1. 蒙特卡罗方法(Monte Carlo method) 0x1:从布丰投针实验说起 - 只要实验次数够多,我就能直到上帝的意图 18世纪,布丰提出以下问题:设我们有一个以平行且等距木纹铺成的地板( ...
增强学习（四） ----- 蒙特卡罗方法(Monte Carlo Methods)
1. 蒙特卡罗方法的基本思想蒙特卡罗方法又叫统计模拟方法,它使用随机数(或伪随机数)来解决计算的问题,是一类重要的数值计算方法.该方法的名字来源于世界著名的赌城蒙特卡罗,而蒙特卡罗方法正是以概率为基 ...
Monte carlo
转载 http://blog.sciencenet.cn/blog-324394-292355.html 蒙特卡罗(Monte Carlo)方法,也称为计算机随机模拟方法,是一种基于"随机数 ...
[Bayes] MCMC (Markov Chain Monte Carlo)
不错的文章:LDA-math-MCMC 和 Gibbs Sampling 可作为精进MCMC抽样方法的学习材料. 简单概率分布的模拟 Box-Muller变换原理详解本质上来说,计算机只能生产符合均 ...
Monte Carlo与TD算法
RL 博客:http://blog.sciencenet.cn/home.php?mod=space&uid=3189881&do=blog&view=me&from= ...
简析Monte Carlo与TD算法的相关问题
Monte Carlo算法是否能够做到一步更新,即在线学习? 答案显然是不能,如果可以的话,TD算法还有何存在的意义?MC算法必须要等到episode结束后才可以进行值估计的主要原因在于对Return ...

随机推荐

gdal库对ENVI文件的一点支持不好
作者:朱金灿来源:http://blog.csdn.net/clever101 使用GDALOpen函数打开ENVI的img文件,如果使用更新的方式即GA_Update会改写对应的hdr文件.改写h ...
Display controller
Field of the Invention The present invention relates to a display controller. Background to the inve ...
文件控制 fcntl函数具体解释
摘要:本文主要讨论文件控制fcntl函数的基本应用.dup函数能够拷贝文件描写叙述符,而fcntl函数与dup函数有着异曲同工之妙.而且还有更加强大的功能,能够获取或设置已打开文件的性质,操作文件锁. ...
用C语言编写简单的病毒
[摘要]在分析病毒机理的基础上,用C语言写了一个小病毒作为实例,用TURBOC2.0实现. [Abstract] This paper introduce the charateristic of t ...
instsrv.exe用法
这个小工具是用以安装和卸载可执行的服务和指派服务名给这些可执行的服务的. 一:绑定程序和服务这里我们设定要将F:\cpu.exe 以 abc 的名称显示作为服务的话,我们应当这样子做: 在开 ...
WPF多线程UI更新
前言在WPF中,在使用多线程在后台进行计算限制的异步操作的时候,如果在后台线程中对UI进行了修改,则会出现一个错误:(调用线程无法访问此对象,因为另一个线程拥有该对象.)这是很常见的一个错误,一不小 ...
买不起360随声wifi怎么办？这些都不是问题
只需轻松一步,点击开启共享软件下载地址:http://download.csdn.net/detail/lxq_xsyu/6384265 如果身边没有数据线怎么办?? 使用方法: 1.用手机连接Wi ...
javascript的回调函数同步异步
后一个任务等待前一个任务结束再执行.程序执行顺序与任务排列顺序一致的,同步的. 参考: http://www.ruanyifeng.com/blog/2012/12/asynchronous%EF%B ...
Python 金融数据分析（一）—— 股票数据
1. tushare 库 tushare 的官网请见:TuShare -财经数据接口包,是国人自己开发的 Python 爬数据工具(所谓的爬,自然就是在线连网获取数据),囊括股票.期货.宏观经济.电影 ...
sklearn 下的流行学习（Manifold Learning）—— sklearn.manifold
1. t-SNE from sklearn.manifold import TSNE X_proj = TSNE(random_state=123).fit_transform(X) 2. t_sne ...

基于Monte Carlo方法的2048 A.I.

基于Monte Carlo方法的2048 A.I.的更多相关文章

随机推荐

热门专题