先安装tensorflow 1.2版本和python 3.6,

接着安装:

numpy-1.13.1+mkl-cp36-cp36m-win_amd64.whl

的版本,这个是windows下的,如果linux下直接使用pip install numpy就可以了。

再接着安装scipy版本,也是windows 10下64位版本:

scipy-0.19.1-cp36-cp36m-win_amd64.whl

下载这些文件是通过网站:http://www.lfd.uci.edu/~gohlke/pythonlibs/  ,它是提供WINDOWS的版本。

接着下来,就是安装gym模块:

D:\AI\sample\tensorforce>pip install gym

它的网站连接是https://pypi.python.org/pypi/gym/0.2.0

最后,就是安装tesorforce库:

git clone git@github.com:reinforceio/tensorforce.git
cd tensorforce
pip install -e .

这样就可以运行强化学习的例子:

python examples/openai_gym.py CartPole-v0 -a TRPOAgent -c examples/configs/trpo_cartpole.json -n examples/configs/trpo_cartpole_network.json

结果输出如下:

[2017-07-24 11:05:28,928] Making new env: CartPole-v0
2017-07-24 11:05:30.056045: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.056185: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.057611: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059002: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059684: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.060401: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061129: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061744: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.406967: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 4.99GiB
2017-07-24 11:05:30.407097: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0
2017-07-24 11:05:30.408846: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0:   Y
2017-07-24 11:05:30.409518: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
[2017-07-24 11:05:31,317] Starting TRPOAgent for Environment 'OpenAIGym(CartPole-v0)'
[2017-07-24 11:05:34,728] Finished episode 50 after 16 timesteps
[2017-07-24 11:05:34,728] Episode reward: 16.0
[2017-07-24 11:05:34,729] Average of last 500 rewards: 2.6
[2017-07-24 11:05:34,730] Average of last 100 rewards: 13.0
[2017-07-24 11:05:39,224] Finished episode 100 after 81 timesteps
[2017-07-24 11:05:39,224] Episode reward: 81.0
[2017-07-24 11:05:39,226] Average of last 500 rewards: 6.49
[2017-07-24 11:05:39,226] Average of last 100 rewards: 32.45
[2017-07-24 11:05:47,185] Finished episode 150 after 177 timesteps
[2017-07-24 11:05:47,185] Episode reward: 177.0
[2017-07-24 11:05:47,186] Average of last 500 rewards: 13.364
[2017-07-24 11:05:47,187] Average of last 100 rewards: 53.82
[2017-07-24 11:06:00,082] Finished episode 200 after 136 timesteps
[2017-07-24 11:06:00,083] Episode reward: 136.0
[2017-07-24 11:06:00,084] Average of last 500 rewards: 24.414
[2017-07-24 11:06:00,085] Average of last 100 rewards: 89.62
[2017-07-24 11:06:18,207] Finished episode 250 after 144 timesteps
[2017-07-24 11:06:18,208] Episode reward: 144.0
[2017-07-24 11:06:18,209] Average of last 500 rewards: 40.096
[2017-07-24 11:06:18,210] Average of last 100 rewards: 133.66
[2017-07-24 11:06:35,440] Finished episode 300 after 143 timesteps
[2017-07-24 11:06:35,440] Episode reward: 143.0
[2017-07-24 11:06:35,442] Average of last 500 rewards: 55.014
[2017-07-24 11:06:35,443] Average of last 100 rewards: 153.0
[2017-07-24 11:06:52,119] Finished episode 350 after 169 timesteps
[2017-07-24 11:06:52,119] Episode reward: 169.0
[2017-07-24 11:06:52,120] Average of last 500 rewards: 69.586
[2017-07-24 11:06:52,121] Average of last 100 rewards: 147.45
[2017-07-24 11:07:05,441] Finished episode 400 after 73 timesteps
[2017-07-24 11:07:05,441] Episode reward: 73.0
[2017-07-24 11:07:05,441] Average of last 500 rewards: 81.13
[2017-07-24 11:07:05,441] Average of last 100 rewards: 130.58
[2017-07-24 11:07:15,715] Finished episode 450 after 123 timesteps
[2017-07-24 11:07:15,715] Episode reward: 123.0
[2017-07-24 11:07:15,716] Average of last 500 rewards: 89.798
[2017-07-24 11:07:15,716] Average of last 100 rewards: 101.06
[2017-07-24 11:07:23,885] Finished episode 500 after 51 timesteps
[2017-07-24 11:07:23,885] Episode reward: 51.0
[2017-07-24 11:07:23,886] Average of last 500 rewards: 96.714
[2017-07-24 11:07:23,886] Average of last 100 rewards: 77.92
[2017-07-24 11:07:32,642] Finished episode 550 after 62 timesteps
[2017-07-24 11:07:32,642] Episode reward: 62.0
[2017-07-24 11:07:32,643] Average of last 500 rewards: 101.236
[2017-07-24 11:07:32,643] Average of last 100 rewards: 70.19
[2017-07-24 11:07:42,235] Finished episode 600 after 53 timesteps
[2017-07-24 11:07:42,235] Episode reward: 53.0
[2017-07-24 11:07:42,236] Average of last 500 rewards: 104.336
[2017-07-24 11:07:42,237] Average of last 100 rewards: 70.56
[2017-07-24 11:07:52,850] Finished episode 650 after 72 timesteps
[2017-07-24 11:07:52,851] Episode reward: 72.0
[2017-07-24 11:07:52,851] Average of last 500 rewards: 105.88
[2017-07-24 11:07:52,852] Average of last 100 rewards: 77.04
[2017-07-24 11:08:02,817] Finished episode 700 after 80 timesteps
[2017-07-24 11:08:02,817] Episode reward: 80.0
[2017-07-24 11:08:02,818] Average of last 500 rewards: 102.986
[2017-07-24 11:08:02,818] Average of last 100 rewards: 82.87
[2017-07-24 11:08:13,726] Finished episode 750 after 77 timesteps
[2017-07-24 11:08:13,726] Episode reward: 77.0
[2017-07-24 11:08:13,727] Average of last 500 rewards: 96.038
[2017-07-24 11:08:13,727] Average of last 100 rewards: 84.45
[2017-07-24 11:08:25,828] Finished episode 800 after 59 timesteps
[2017-07-24 11:08:25,828] Episode reward: 59.0
[2017-07-24 11:08:25,829] Average of last 500 rewards: 90.658
[2017-07-24 11:08:25,829] Average of last 100 rewards: 91.36
[2017-07-24 11:08:42,711] Finished episode 850 after 177 timesteps
[2017-07-24 11:08:42,711] Episode reward: 177.0
[2017-07-24 11:08:42,712] Average of last 500 rewards: 90.34
[2017-07-24 11:08:42,712] Average of last 100 rewards: 118.96
[2017-07-24 11:09:03,141] Finished episode 900 after 167 timesteps
[2017-07-24 11:09:03,141] Episode reward: 167.0
[2017-07-24 11:09:03,142] Average of last 500 rewards: 95.556
[2017-07-24 11:09:03,142] Average of last 100 rewards: 155.07
[2017-07-24 11:09:27,219] Finished episode 950 after 200 timesteps
[2017-07-24 11:09:27,219] Episode reward: 200.0
[2017-07-24 11:09:27,220] Average of last 500 rewards: 105.26
[2017-07-24 11:09:27,220] Average of last 100 rewards: 175.66
[2017-07-24 11:09:53,139] Finished episode 1000 after 200 timesteps
[2017-07-24 11:09:53,139] Episode reward: 200.0
[2017-07-24 11:09:53,141] Average of last 500 rewards: 118.344
[2017-07-24 11:09:53,141] Average of last 100 rewards: 191.86
[2017-07-24 11:10:19,722] Finished episode 1050 after 200 timesteps
[2017-07-24 11:10:19,722] Episode reward: 200.0
[2017-07-24 11:10:19,723] Average of last 500 rewards: 131.222
[2017-07-24 11:10:19,724] Average of last 100 rewards: 200.0
[2017-07-24 11:10:45,962] Finished episode 1100 after 200 timesteps
[2017-07-24 11:10:45,963] Episode reward: 200.0
[2017-07-24 11:10:45,963] Average of last 500 rewards: 144.198
[2017-07-24 11:10:45,963] Average of last 100 rewards: 199.83

1. RPG游戏从入门到精通


2. WiX安装工具的使用

3. 俄罗斯方块游戏开发
http://edu.csdn.net/course/detail/51104. boost库入门基础
http://edu.csdn.net/course/detail/50295.Arduino入门基础
http://edu.csdn.net/course/detail/49316.Unity5.x游戏基础入门
http://edu.csdn.net/course/detail/48107. TensorFlow API攻略
http://edu.csdn.net/course/detail/44958. TensorFlow入门基本教程
http://edu.csdn.net/course/detail/43699. C++标准模板库从入门到精通 
http://edu.csdn.net/course/detail/332410.跟老菜鸟学C++
http://edu.csdn.net/course/detail/290111. 跟老菜鸟学python
http://edu.csdn.net/course/detail/259212. 在VC2015里学会使用tinyxml库
http://edu.csdn.net/course/detail/259013. 在Windows下SVN的版本管理与实战 
http://edu.csdn.net/course/detail/257914.Visual Studio 2015开发C++程序的基本使用 
http://edu.csdn.net/course/detail/257015.在VC2015里使用protobuf协议
http://edu.csdn.net/course/detail/258216.在VC2015里学会使用MySQL数据库
http://edu.csdn.net/course/detail/2672

实现DQN算法玩CartPole的更多相关文章

  1. 教你从头到尾利用DQN自动玩flappy bird(全程命令提示,GPU+CPU版)【转】

    转自:http://blog.csdn.net/v_JULY_v/article/details/52810219?locationNum=3&fps=1 目录(?)[-] 教你从头到尾利用D ...

  2. DQN算法

    DQN算法:基础入门看看 # -*- coding: utf-8 -*- import random import gym import numpy as np from collections im ...

  3. 【强化学习】DQN 算法改进

    DQN 算法改进 (一)Dueling DQN Dueling DQN 是一种基于 DQN 的改进算法.主要突破点:利用模型结构将值函数表示成更加细致的形式,这使得模型能够拥有更好的表现.下面给出公式 ...

  4. DQN算法原理详解

    一. 概述 强化学习算法可以分为三大类:value based, policy based 和 actor critic. 常见的是以DQN为代表的value based算法,这种算法中只有一个值函数 ...

  5. TensorFlow利用A3C算法训练智能体玩CartPole游戏

    本教程讲解如何使用深度强化学习训练一个可以在 CartPole 游戏中获胜的模型.研究人员使用 tf.keras.OpenAI 训练了一个使用「异步优势动作评价」(Asynchronous Advan ...

  6. DRL 教程 | 如何保持运动小车上的旗杆屹立不倒?TensorFlow利用A3C算法训练智能体玩CartPole游戏

    本教程讲解如何使用深度强化学习训练一个可以在 CartPole 游戏中获胜的模型.研究人员使用 tf.keras.OpenAI 训练了一个使用「异步优势动作评价」(Asynchronous Advan ...

  7. 【转】【强化学习】Deep Q Network(DQN)算法详解

    原文地址:https://blog.csdn.net/qq_30615903/article/details/80744083 DQN(Deep Q-Learning)是将深度学习deeplearni ...

  8. 强化学习算法DQN

    1 DQN的引入 由于q_learning算法是一直更新一张q_table,在场景复杂的情况下,q_table就会大到内存处理的极限,而且在当时深度学习的火热,有人就会想到能不能将从深度学习中借鉴方法 ...

  9. [DQN] What is Deep Reinforcement Learning

    已经成为DL中专门的一派,高大上的样子 Intro: MIT 6.S191 Lecture 6: Deep Reinforcement Learning Course: CS 294: Deep Re ...

随机推荐

  1. 20145216史婧瑶《Java程序设计》第3周学习总结

    20145216 <Java程序设计>第3周学习总结 教材学习内容总结 第四章 认识对象 4.1 类与对象 •对象(Object):存在的具体实体,具有明确的状态和行为 •类(Class) ...

  2. 20145333 《Java程序设计》第二次实验报告

    2014333 <Java程序设计>第二次实验报告 课程:Java程序设计 指导教师:娄嘉鹏 实验日期:2016.04.12 实验名称:Java面向对象程序设计 实验内容 初步掌握单元测试 ...

  3. linux内核第一二章总结

    1 Linux内核简介 1 Unix的历史 1.Unix演化版实现了任务管理.换页机制.TCP/IP等新的特性. 2.Unix的特点: Unix很简洁,仅仅提供几百个系统调用并且有一个非常明确的设计目 ...

  4. MR案例:外连接代码实现

    [外连接]是在[内连接]的基础上稍微修改即可.具体HQL语句详见Hive查询Join package join.map; import java.io.IOException; import java ...

  5. [转载]Javassist 使用指南(三)

    ======================= 本文转载自简书,感谢原作者!. 原链接如下:https://www.jianshu.com/p/7803ffcc81c8 =============== ...

  6. Continue SQL query even on errors

    trymysql --force < sample_data.sql Mysql help section says -f, --force         Continue even if w ...

  7. Flume在企业大数据仓库架构中位置及功能

    Flume在企业大数据仓库架构中位置及功能 hadoop 数据仓库 flume 数据仓库架构 1.如下图所示,外部数据中,关系型数据库导入到HDFS用sqoop,由Nginx产生的文件实时监控用Flu ...

  8. 【边框回归】边框回归(Bounding Box Regression)详解(转)

    转自:打开链接 Bounding-Box regression 最近一直看检测有关的Paper, 从rcnn, fast rcnn, faster rcnn, yolo, r-fcn, ssd,到今年 ...

  9. 05_zookeeper_原生API使用1(更新)

    1. java方式操作远端zookeeper集群概述 步骤:下载zookeeper压缩包并解压, 创建java工程,导入zookeeper相关jar包 (1)下载zookeeper压缩包 http:/ ...

  10. Git 设置 SOCKS 代理

    $ export all_proxy=socks5://127.0.0.1:1080