In this post, I will illustrate Markov Property, Markov Reward Process and finally Markov Decision Process, which are fundamental concepts in Reinforcement Learning.

Markov Property

'The state is independent of the past given the present'

Markov Process (Markov Chain)

Keywords: state, transition matrix

A Markov Process is defined by a Tuple(S,P), in which S is the state space, and P is the transition matrix. The following chart is an example.

A transition matrix demonstrates the probabilities of transitioning from one state to another.

In the example above, the transition matrix is:

Markov Reward Process: Markov Process with Value Judgement

Keywords: Reward, Return, Discount Factor, Value Function

MRP add two additional properties into Markov Chain: one is Reward, who represents the immediate feedback an agent can receive at time t+1 if he is in state s at time t; another property is Discount Factor γ∈[0,1]. So the representation tuple is [S,P,R,γ].

Formally, Reward is the immediate feedback, which means when agent gets to state s at time t, it can definetly receive this reward at time t+1. It is defined by:

Given reward and discount factor, we can calculate the Return for a given senario by this equation:

Example for Return calculation:

Senario: Class1->Class2->Class3->Pass->Sleep, and the agent is at state=Class1.

Case 1: when gamma=0, g=-2+(-2)*0+(-2)*0+10*0=-2

Case 2: when gamma=1, g=-2+(-2)*1+(-2)*1+10*1=4

Case 3: when gamma=0.8, g=-2+(-2)*0.8+(-2)*0.64+10*=-4-1.6+5.12=-0.48

From different γ, we know our agent can be exetremely short-sighted (far-sighted) only for immediate reward, or trying to seek balance between short and long term reward.

When an agent is in a certain state, the way to measure the total reward from this state over time is calculating expected Returns for all possible senarios. The function to calculate it is called Value Function:

Ex. If the agent is at Class3 state, it has 0.6 and 0.4 probabilities to transite to Pass and Pub respectively. Because there are loops inside the graph, it's difficult to directly derive expected return from value function. (Forget the red labeled value, they are result...)

Bellman Equation helps to solve this complexity:

It breaks the value function into two parts: Immediate Reward and Future Reward:The future reward is discounted by γ, and it has probabilities on different states, so actually the future reward is an expectation.

Now we can use Bellman Equation to solve value function:

Markov Decision Process: MRP with Actions

Keywords: Action

Markov Decision Process adds more complexity onto MRP, it is defined by a tuple(S,A,P,R,γ), in which:

S is state space, and γ is discount factor, they are same as MRP.

A is a finite set of Actions, which is new. Then because of the existense of Action, Transition Matrix and Reward Function are all conditional on both State and Action.

P is State Transition Matrix: it is conditional on state and action at time t, which means different actions would result in different distribution of state at time t+1.

R is Reward Function conditional upon state and action: also, different actions lead to different reward, despite of the same state s.

A graph(from Wikipedia) helps understanding the role of actions:

So by now, we have already had the model of the environment: all states, all possible actions and transition matrix conditional on state and actions.

Step-by-step from Markov Process to Markov Decision Process的更多相关文章

  1. Step by step Process of creating APD

    Step by step Process of creating APD: Business Scenario: Here we are going to create an APD on top o ...

  2. Step by Step Process of Migrating non-CDBs and PDBs Using ASM for File Storage (Doc ID 1576755.1)

    Step by Step Process of Migrating non-CDBs and PDBs Using ASM for File Storage (Doc ID 1576755.1) AP ...

  3. Tomcat Clustering - A Step By Step Guide --转载

    Tomcat Clustering - A Step By Step Guide Apache Tomcat is a great performer on its own, but if you'r ...

  4. [ZZ] Understanding 3D rendering step by step with 3DMark11 - BeHardware >> Graphics cards

    http://www.behardware.com/art/lire/845/ --> Understanding 3D rendering step by step with 3DMark11 ...

  5. e2e 自动化集成测试 架构 实例 WebStorm Node.js Mocha WebDriverIO Selenium Step by step (二) 图片验证码的识别

    上一篇文章讲了“e2e 自动化集成测试 架构 京东 商品搜索 实例 WebStorm Node.js Mocha WebDriverIO Selenium Step by step 一 京东 商品搜索 ...

  6. Code Understanding Step by Step - We Need a Task

      Code understanding is a task we are always doing, though we are not even aware that we're doing it ...

  7. enode框架step by step之saga的思想与实现

    enode框架step by step之saga的思想与实现 enode框架系列step by step文章系列索引: 分享一个基于DDD以及事件驱动架构(EDA)的应用开发框架enode enode ...

  8. 课程五(Sequence Models),第一 周(Recurrent Neural Networks) —— 1.Programming assignments:Building a recurrent neural network - step by step

    Building your Recurrent Neural Network - Step by Step Welcome to Course 5's first assignment! In thi ...

  9. 精通initramfs构建step by step

    (一)hello world  一.initramfs是什么  在2.6版本的linux内核中,都包含一个压缩过的cpio格式 的打包文件.当内核启动时,会从这个打包文件中导出文件到内核的rootfs ...

随机推荐

  1. [Codeforces 1205B]Shortest Cycle(最小环)

    [Codeforces 1205B]Shortest Cycle(最小环) 题面 给出n个正整数\(a_i\),若\(a_i \& a_j \neq 0\),则连边\((i,j)\)(注意i- ...

  2. 「Vue.js」Vue-Router + Webpack 路由懒加载实现

    一.前言 当打包构建应用时,Javascript 包会变得非常大,影响页面加载.如果我们能把不同路由对应的组件分割成不同的代码块,然后当路由被访问的时候才加载对应组件,这样就更加高效了.结合 Vue ...

  3. vue使用canvas生成海报图

    有个挺好用的插件能很好地实现vue生成海报图,虽然有一定的限制,但基本需求还是能实现的 1.安装 npm i vue-canvas-poster --save 2.全局配置 // or Global ...

  4. ORA-00911: invalid character 错误解决

    多数情况如下: 控制面板--系统和安全---系统--高级系统设置--高级--环境变量--系统变量中 变量名:NLS_LANG 变量值:SIMPLIFIED CHINESE_CHINA.ZHS16GBK ...

  5. 【知识强化】第二章 数据的表示和运算 2.4 算术逻辑单元ALU

    从本节开始我们就进入到本章的最后一节内容了,也就是我们算术逻辑单元的它的实现.这部分呢是数字电路的一些知识,所以呢,如果你没有学过数字电路的话,也不要慌张,我会从基础开始给大家补起.那么在计算机当中, ...

  6. 解析安装mysql

    大多数人在结束咱们前面学习的基础知识的时候,其实一脸懵逼,不过我们已经开始步入了另一个新的高度,针对基础知识还是必须巩固针对性的进行补充,可以分模块总结:比如基础知识的数据结构---->函数-- ...

  7. Linux下C语言实现ATM取款机,消息队列版本

    链接:https://pan.baidu.com/s/1oBavXBuZul7ZAEBL1eYfRA 提取码:ffhg Mybank ATM取款机实验,消息队列实现本实验用的是Centos71.在服务 ...

  8. SpringBoot 利用freemaker生成静态页面

    1. <!-- freemarker模板 --> <dependency> <groupId>org.springframework.boot</groupI ...

  9. django权限之二级菜单

    遗漏知识点 1.构建表结构时,谁被关联谁就是主表,在层级删除的时候,删除子表的时候,主表不会被删除,反之删除主表的话,字表也会被删除, 使用related_name=None   反向查询,起名用的 ...

  10. man arch

    ARCH(1)   Linux Programmer?. Manual/Linux程序员手册       ARCH(1) NAME/名称       arch - print machine arch ...