机器学习算法-HMM

1. 模型定义

​ 隐马尔可夫模型(HMM)是一个关于时序的概率模型,是一种特殊的概率图模型。该图模型包含了两个序列:状态序列\(\{z_1, z_2, ..., z_T\}\)和观测序列\(\{x_1, x_2, ..., x_T\}\),取值分别来自于状态集合\(Q=\{q_1, q_2, ..., q_N\}\)和观测集合\(V=\{v_1, v_2, ..., v_M\}\),类比到GMM或者EM算法中,分别对应隐变量和观测变量。HMM包含了两个基本假设:

  • 齐次马尔科夫假设(假设1):任何时刻的状态只依赖以前一时刻的状态,与其他因素无关,即\(p(z_{t+1}|z_t, \cdot) = p(z_{t+1}|z_t)\)

  • 观测独立假设(假设2):任何时刻的观测结果只依赖以当前的状态,即\(p(x_t|z_t, \cdot)=p(x_t|z_t)\)

一个HMM需要用三个参数描述:

  • 状态转移矩阵\(A = [a_{ij}]_{N\times N}\):\(a_{ij}=p(q_j|q_i)\),即上一个时刻状态为\(q_i\)时,下一个时刻出现状态\(q_j\)的概率
  • 观测矩阵\(B=[b_{jk}]_{N\times M}\):\(b_{jk}=p(v_k|q_j)\),即在状态为\(q_j\)的情况下,观测到\(v_k\)的概率
  • 初始向量\(\pi = [\pi_i]_N\):\(\pi_i = p(z_1 = q_i)\),即在初始条件下,出现状态\(q_i\)的概率

因此HMM类被定义如下:

class Hmm():
def __init__(self, state_num, obser_num,
state_transfer_matrix = None,
obser_matrix = None,
init_state_vec = None):
'''
state set: Q = {q_1, q_2, ..., q_N}
observation set : V = {v_1, v_2,..., v_M} state transfer matrix: a_ij = p(q_j|q_i)
observation prob. : b_jk = p(v_k|q_j)
inititial prob.: pi_i = p(i_1 = q_i) state sequence: I = {i_1, i_2, ..., i_T}
observation sequence : O = {o_1, o_2, ..., o_T}
'''
self.N = state_num
self.M = obser_num self.A = state_transfer_matrix
self.B = obser_matrix
self.pi = init_state_vec self.eps = 1e-6

2. 序列生成

按照模型\(\lambda=(A,B,\pi)\)生成当前状态、观测结果、下一个时刻的状态即可

	def generate(self, T: int):
'''
根据给定的参数生成观测序列
T: 指定要生成数据的数量
'''
z = self._get_data_with_distribute(self.pi) # 根据初始概率分布生成第一个状态
x = self._get_data_with_distribute(self.B[z]) # 生成第一个观测数据
result = [x]
for _ in range(T-1): # 依次生成余下的状态和观测数据
z = self._get_data_with_distribute(self.A[z])
x = self._get_data_with_distribute(self.B[z])
result.append(x)
return np.array(result)
def _get_data_with_distribute(self, dist): # 根据给定的概率分布随机返回数据(索引)
return np.random.choice(np.arange(len(dist)), p=dist)

3. 概率计算

概率计算的目标是,在已知模型的情况下,计算给定的观测序列出现的概率\(p(X\mid \lambda)\)。一种直接的计算方式\(\ref{prob_brute}\)是枚举所有的状态序列\(Z\),然后求和:

\[\begin{equation}\label{prob_brute}
p(X\mid \lambda)=\sum_Z p(X,Z\mid \lambda) = \sum_{z_1, z_2, ...,z_T} \pi_{z_1}b_{z_1}(x_1) a_{z_1, z_2}b_{z_2}(x_2)...a_{z_{T-1},z_{T}}b_{z_T}(x_T)
\end{equation}
\]

但是直接计算的复杂度太大,为了提高计算效率,有前向和后向两种计算方式。

	def compute_prob(self, seq, method = 'forward'):
''' 计算概率p(O),O为长度为T的观测序列
Parameters
----------
seq : numpy.ndarray
长度为T的观测序列
method : string, optional
概率计算方法,默认为前向计算
前向:_forward()
后向:_backward()
Returns
-------
TYPE
前向:p(O), alpha
TYPE
后向:p(O), beta
'''
if method == 'forward':
alpha = self._forward(seq)
return np.sum(alpha[-1]), alpha
else:
beta = self._backward(seq)
return np.sum(beta[0]*self.pi*self.B[:,seq[0]]), beta
return None

3.1 前向计算

定义前向概率\(\alpha_t(i)\):

\[\alpha_t(i) = p(x_1, x_2, ..., x_t, z_t = q_i\mid \lambda)\label{alpha}
\]

根据定义\(\ref{alpha}\)可知:

\[\begin{equation}
\alpha_1(i)=p(x_1, z_1 = q_i\mid \lambda) = p(z_1=q_i\mid \lambda)p(x_1\mid z_1=q_i, \lambda)=\pi_i b_i(x_1)\\
\sum_{1\leq i\leq N}\alpha_T(i) = \sum_{1\leq i\leq N} p(x_1, x_2, ..., x_T, z_T = q_i\mid \lambda) = p(X\mid \lambda) \label{prob_forward}
\end{equation}
\]

在已知\(\alpha_t(i), \forall i\in{1, 2,.., N}\)时,求解\(\alpha_{t+1}(i)\):

\[\begin{align}
\alpha_{t+1}(j) &=\sum_{i}^{N} P\left(x_{1}, \ldots, x_{t+1}, z_{t}= q_{i}, z_{t+1}=q_{j} \mid \lambda\right) \\
&=\sum_{i=1}^{N} \bbox[5px, border:2px solid red]{P\left(x_{t+1} \mid x_{1}, \ldots, x_{t}, z_{t}=q_{i}, z_{t+1}=q_{j}, \lambda\right)} \bbox[5px, border:2px solid blue]{P\left(x_{1}, \ldots, x_{t}, z_{t}=q_{i}, z_{t+1}=q_{j} \mid \lambda\right)}
\end{align}
\]

红色部分根据假设2:

\[P\left(x_{t+1} \mid x_{1}, \ldots, x_{t}, z_{t}=q_{i}, z_{t+1}=q_{j}, \lambda\right)=P\left(x_{t+1} \mid z_{t+1}=q_{j}\right)=b_{j}\left(x_{t+1}\right)
\]

蓝色部分根据假设1:

\[\begin{align}
P\left(x_{1}, \ldots, x_{t},z_{t}=q_{i}, z_{t+1} = q_{j} \mid \lambda\right)
&=P\left(z_{t+1}=q_{j} \mid x_{1}, \ldots, x_{t}, z_{t}=q_{i}, \lambda\right) P\left(x_{1}, \ldots, x_{t}, z_{t}=q_{i} \mid \lambda\right) \\
&=P\left(z_{t+1}=q_{j} \mid z_{t}=q_{i}\right) P\left(x_{1}, \ldots, x_{t}, z_{t}=q_{i} \mid \lambda\right) \\
&=a_{i j} \alpha_{t}(i)
\end{align}
\]

因此:

\[\alpha_{t+1}(j)=\sum_{i=1}^{N} a_{i j} b_{j}\left(x_{t+1}\right) \alpha_{t}(i)\label{alpha_t+1}
\]

根据\(\ref{prob_forward}\)和\(\ref{alpha_t+1}\)得到前向计算的代码:

	def _forward(self, seq):
T = len(seq)
alpha = np.zeros((T, self.N))
alpha[0,:] = self.pi * self.B[:,seq[0]]
for t in range(T-1):
alpha[t+1] = np.dot(alpha[t], self.A) * self.B[:, seq[t+1]]
return alpha

3.2 后向计算

定义后向概率\(\beta_t(i)\):

\[\beta_{t}(i)=P\left(x_{T}, x_{T-1}, \ldots, x_{t+1} \mid z_{t}=q_{i}, \lambda\right)\label{beta}
\]

根据定义\(\ref{beta}\)可得:

\[\begin{equation}
\beta_T(i) = P(x_{T+1}\mid z_T=q_i, \lambda)=1\\
\begin{aligned}
P(X \mid \lambda)&=P\left(x_{1}, x_{2}, \ldots, x_{T} \mid \lambda\right) \\
&=\sum_{i=1}^{N} P\left(x_{1}, x_{2}, \ldots, x_{T}, z_{1}=q_{i} \mid \lambda\right)\\
&=\sum_{i=1}^{N} P\left(x_{1} \mid x_{2}, \ldots, x_{T}, z_{1}=q_{i}, \lambda\right) P\left(x_{2}, \ldots, x_{T}, z_{1}=q_{i} \mid \lambda\right) \\
&=\sum_{i=1}^{N} P\left(x_{1} \mid z_{1}=q_{i}\right) P\left(z_{1}=q_{i} \mid \lambda\right) P\left(x_{T}, x_{T-1}, \ldots, x_{2} \mid z_{1}=q_{i}, \lambda\right) \\
&=\sum_{i=1}^{N} b_{i}\left(x_{1}\right) \pi_{i} \beta_{1}(i)
\end{aligned}
\end{equation}
\]

可以验证,在任意时刻t,存在:

\[\begin{align}
P(X\mid \lambda) &= \sum_{i}P(X, z_t = q_i\mid \lambda)\\
&= \sum_i P(x_1, x_2, \ldots, x_t, x_{t+1}, \ldots, x_T, z_t = q_i)\\
&= \sum_{i} P(x_1, \ldots, x_t, z_t = q_i)\bbox[border:2px red]{P(x_{t+1}, \ldots, x_T\mid x_{1}, \ldots, x_t, z_t = q_i)}\\
\end{align}
\]

红色部分的计算如下:

\[\begin{align}
P(x_{t+1}, \ldots, x_T\mid x_{1}, \ldots, x_t, z_t = q_i) &= \frac{P(X\mid z_t = q_i)}{P(x_1, \ldots, x_t\mid z_t = q_i)}\\
&= \frac{P(x_t\mid x_1, \ldots, x_{t-1}, x_{t+1}, \ldots, x_T, z_t = q_i)P(x_1, \ldots, x_{t-1}, x_{t+1}, \ldots, x_T\mid z_t = q_i)}{P(x_t\mid x_1, \dots, x_{t-1}, z_t = q_i)P(x_1, \ldots, x_{t-1}\mid z_t = q_i)}\\
&= \frac{P(x_t\mid z_t = q_i)P(x_1,\ldots, x_{t-1}\mid z_t = q_i)P(x_{t+1}, \ldots, x_T\mid z_t = q_i)}{P(x_t\mid z_t = q_i)P(x_1, \ldots, x_{t-1}\mid z_t = q_i))}\\
&= P(x_{t+1}, \ldots, x_T\mid z_t = q_i)
\end{align}
\]

结合定义\(\ref{alpha}\)和\(\ref{beta}\)得到:

\[P(X\mid \lambda) = \sum_i \alpha_t(i)\beta_t(i), \quad \forall t\in\{1, 2, \ldots, T\}
\]

在已知\(\beta_t(i), \forall i\in{1, 2,.., N}\)时,求解\(\beta_{t+1}(i)\):

\[\begin{align}

\beta_{t}(i) &=P\left(x_{T}, \ldots, x_{t+1} \mid z_{t}=q_{i}, \lambda\right) \\
&=\sum_{j=1}^{N} P\left(x_{T}, \ldots, x_{t+1}, z_{t+1}=q_{j} \mid z_{t}=q_{i}, \lambda\right) \\
&=\sum_{j=1}^{N} \bbox[border:red 2px solid ]{P\left(x_{T}, \ldots, x_{t+1} \mid z_{t+1}=q_{j}, z_{t}=q_{i}, \lambda\right)}\bbox[border:blue 2px solid] {P\left(z_{t+1}=q_{j} \mid z_{t}=q_{i}, \lambda\right)}
\label{beta_t+1}
\end{align}
\]

蓝色部分即为\(a_{ij}\),红色部分为:

\[\begin{array}{l}
P\left(x_{T}, \ldots, x_{t+1} \mid z_{t+1}=q_{j}, z_{t}=q_{i}, \lambda\right)
&=P\left(x_{T}, \ldots, x_{t+1} \mid z_{t+1}=q_{j}, \lambda\right) \\
&=P\left(x_{t+1} \mid x_{T}, \ldots, x_{t-2}, z_{t+1}=q_{j}, \lambda\right) P\left(x_{T}, \ldots, x_{t-2} \mid z_{t+1}=q_{j}, \lambda\right) \\
&=P\left(x_{t+1} \mid z_{t+1}=q_{j}\right) P\left(x_{T}, \ldots, x_{t+2} \mid z_{t+1}=q_{j}, \lambda\right) \\
&=b_{j}\left(x_{t+1}\right) \beta_{t+1}(j)\label{beta_red}
\end{array}
\]

第一个等号成立是因为:

\[\begin{align}
P(x_T, \ldots, x_{t+1}\mid z_{t+1}, z_t) &= \frac{P(x_T, \ldots, x_{t+1}, z_{t+1}, z_t)}{P(z_{t+1}, z_t)}\\
&= \frac{P(x_T, \ldots, x_{t+1}\mid z_{t+1})P(z_{t+1}\mid z_t)P(z_t)}{P(z_{t+1}, z_t)}\\
\end{align}
\]

将\(\ref{beta_red}\)代入到\(\ref{beta_t+1}\)得到:

\[\beta_{t}(i)=\sum_{j=1}^{N} a_{i j} b_{j}\left(x_{t+1}\right) \beta_{t-1}(j)
\]

根据公式得到后向计算代码为:

	def _backward(self, seq):
T = len(seq)
beta = np.ones((T, self.N))
for t in range(T-2, -1, -1):
beta[t] = np.dot(self.A, self.B[:, seq[t+1]] * beta[t+1])
return beta

4. 学习

HMM的学习是根据观测序列去推测模型参数,学习算法采用了EM算法,具体过程为:

E-step:

\[\begin{align}
\mathcal{Q}(\lambda, \lambda^{\mathrm{old}}) &= \mathbb{E}_{Z\sim P(Z\mid X,\lambda^{\mathrm{old}})}\left[\log P(X, Z\mid \lambda)\right]\\
&= \sum_{Z} \log\big(\bbox[border:red 2px]{P(Z\mid \lambda)} \bbox[border:blue 2px]{P(X\mid Z, \lambda)}\big)\bbox[border:purple 2px]{P(Z\mid X, \lambda^{\mathrm{old}})}\\
\end{align}
\]

红色部分:

\[P(Z\mid \lambda) = \pi(z_1)\prod_{t=2}^T P(z_t\mid z_{t-1})
\]

蓝色部分:

\[\begin{align}
P(X\mid Z, \lambda) &= P(x_1, x_2, \ldots, x_T\mid z_1, z_2, \ldots, z_T, \lambda)\\
&= P(x_T\mid x_1, x_2, \ldots, x_{T-1}, z_1, z_2, \ldots, z_T)P(x_1, \ldots, x_{T-1}\mid z_1,\ldots, z_T)\\
&= P(x_T\mid z_T)P(x_1, \ldots, x_{T-1}\mid z_1,\ldots, z_T)\\
&\vdots\\
&= \prod_{t=1}^T P(x_t\mid z_t)
\end{align}
\]

紫色部分:

\[\begin{align}
P(Z\mid X, \lambda^{\mathrm{old}}) &= \frac{P(X,Z\mid \lambda^{\mathrm{old}})}{P(X\mid \lambda^{\mathrm{old}})}
\end{align}
\]

分母是常量,因此省略。将各个部分代入到Q函数中得到:

\[\begin{align}
\lambda =& \arg\max_{\lambda} \sum_{Z} P(Z\mid X, \lambda^{\mathrm{old}})\log\big(P(Z\mid \lambda)P(X\mid Z, \lambda)\big) \\
=& \arg\max_{\lambda} \sum_Z P(X,Z\mid \lambda^{\mathrm{old}})\log \big(
\pi(z_1)\prod_{t=2}^T P(z_t\mid z_{t-1})
\prod_{t=1}^T P(x_t\mid z_t)
\big)\\
=& \arg\max_{\lambda} \bbox[6px, border:red 2px]{\sum_Z P(X,Z\mid \lambda^{\mathrm{old}})\log\pi(z_1)}\\
&+ \arg\max_{\lambda} \bbox[6px, border:blue 2px]{\sum_Z P(X,Z\mid \lambda^{\mathrm{old}})\log \prod_{t=2}^T P(z_t\mid z_{t-1})} \\
&+\arg\max_{\lambda} \bbox[6px, border:purple 2px]{\sum_Z P(X,Z\mid \lambda^{\mathrm{old}}) \log \prod_{t=1}^T P(x_t\mid z_t)}\\
\end{align}
\]

红色、蓝色和紫色框分别对应\(\pi,B,A\)三部分

4.1 求解\(\pi\)

参数\(\pi\)满足\(\sum_{i=1}^N\pi_i = 1\),拉格朗日函数为:

\[\begin{align}
\mathcal{L}_1 &= \sum_Z P(X, Z\mid \lambda_{\mathrm{old}})\log \pi(z_1) +\mu (\sum_{i=1}^N \pi_i - 1)\\
&= \sum_{1\leq i \leq N} P(z_1 = q_i, X\mid \lambda^{\mathrm{old}})\log \pi_i + \mu (\sum_{i=1}^N \pi_i - 1)\\
\end{align}
\]

求导得到:

\[\nabla_{\pi_i} \mathcal{L}_1 = \frac{1}{\pi_i} P(z_1 = q_i, X \mid \lambda^{\mathrm{old}}) + \mu \\
\Longrightarrow \pi_i = \frac{P(z_1=q_i, X \mid \lambda^{\mathrm{old}})}{\sum_{1 \leq j \leq N} P(z_1=q_j, X \mid \lambda^{\mathrm{old}})}
\]

定义:\(\gamma_t(i) = P(z_t = q_i, X\mid \lambda^{\mathrm{old}})=\alpha_t(i)\beta_t(i)\),得到:

\[\pi_i = \frac{\gamma_1(i)}{\sum_j \gamma_1(j)}=\frac{\alpha_t(i)\beta_t(i)}{\sum_j \alpha_t(j)\beta_t(j)}
\]

4.2 求解\(A\)

A的行向量的和为1,写出关于\(A\)的拉格朗日函数:

\[\begin{align}
\mathcal{L}_2 &= \sum_Z P(X,Z\mid \lambda^{\mathrm{old}}) \log \prod_{t=1}^T P(x_t\mid z_t) + \mu (AE - \mathbb{1})\\
&= \sum_{2\leq t \leq T} \sum_{1\leq i,j \leq N}P(z_{t-1} = q_i, z_t = q_j, X \mid \lambda^{\mathrm{old}})\log P(z_t\mid z_{t-1}) + \mu (AE - \mathbb{1})\\
&= \sum_{2\leq t \leq T} \sum_{1\leq i,j \leq N}P(z_{t-1} = q_i, z_t = q_j, X \mid \lambda^{\mathrm{old}})\log a_{ij} + \mu (AE - \mathbb{1})
\end{align}
\]

其中E的元素全部为1。求导得到:

\[\begin{align}
\nabla_{a_{ij}}\mathcal{L}_2 = \frac{1}{a_{ij}} \sum_{1\leq t \leq T-1} P(z_t = q_i, z_{t+1}=q_j, X\mid \lambda^{\mathrm{old}}) + \mu_i a_{ij} \\
\Longrightarrow a_{ij} = \frac{\sum_{1\leq t \leq T-1} P(z_t = q_i, z_{t+1}=q_j, X\mid \lambda^{\mathrm{old}}}{\sum_j \sum_{1\leq t \leq T-1} P(z_t = q_i, z_{t+1}=q_j, X\mid \lambda^{\mathrm{old}}}
\end{align}
\]

定义:\(\xi_t(i,j) = P(z_t=q_i, z_{t+1}=q_j, X\mid \lambda^{\mathrm{old}})\),结合\(\gamma_t(i)\)的定义可知\(\gamma_t(i) = \sum_j \xi_t(i,j)\),代入上式得到:

\[a_{ij} = \frac{\sum_{1\leq T \leq T-1} \xi_t(i,j)}{\sum_{1\leq t \leq T-1}\gamma_t(i)}
\]

注意分母上求和到T-1

4.3 求解\(B\)

B的行向量的和为1,写出关于B的拉格朗日函数:

\[\begin{align}
\mathcal{L}_3 &= \sum_Z P(X,Z\mid \lambda^{\mathrm{old}}) \log \prod_{t=1}^T P(x_t\mid z_t) + \mu(BE-\mathbb{1})\\
&= \sum_{1\leq k \leq M} \sum_{1\leq t \leq T, x_t = v_k} \sum_{1\leq j \leq N} P(z_t = q_j, X \mid \lambda^{\mathrm{old}})\log P(x_t \mid z_t = q_j) + \mu(BE-\mathbb{1})
\end{align}
\]

求导得到:

\[\nabla_{b_{jk}} \mathcal{L}_3 = \frac{1}{b_{jk}} \sum_{1\leq t\leq T, x_t = v_k} P(z_t = q_j, X \mid \lambda^{\mathrm{old}}) + \mu_j \\
\Longrightarrow b_jk = \frac{\sum_{1\leq t\leq T, x_t = v_k} P(z_t = q_j, X \mid \lambda^{\mathrm{old}})}{\sum_{1\leq t\leq T} P(z_t = q_j, X \mid \lambda^{\mathrm{old}})} = \frac{\sum_{1\leq t \leq T, x_t = v_k}\gamma_t(j)}{\sum_{1\leq t \leq T} \gamma_t(j)}
\]

HMM模型的学习代码为:

	def baum_welch(self, seq, max_Iter = 500, verbos = 10):
'''HMM学习,使用Baum_Welch算法
Parameters
----------
seq : np.ndarray
观测序列,长度为T
max_Iter : int, optional
最大的训练次数,默认为500
verbos : int, optional
打印训练信息的间隔,默认为每10步打印一次
Returns
-------
None. '''
# 初始化模型参数
self._random_set_patameters()
# 打印训练之前模型信息
self.print_model_info() for cnt in range(max_Iter):
# alpha[t,i] = p(o_1, o_2, ..., o_t, i_t = q_i)
alpha = self._forward(seq)
# beta[t,i] = p(o_t+1, o_t+2, ..., o_T|i_t = q_i)
beta = self._backward(seq) # gamma[t,i] = p(i_t = q_i | O)
gamma = self._gamma(alpha, beta)
# xi[t,i,j] = p(i_t = q_i, i_t+1 = q_j | O)
xi = self._xi(alpha, beta, seq) # update model
self.pi = gamma[0] / np.sum(gamma[0])
self.A = np.sum(xi, axis=0)/(np.sum(gamma[:-1], axis=0).reshape(-1,1)+self.eps) for obs in range(self.M):
mask = (seq == obs)
self.B[:, obs] = np.sum(gamma[mask, :], axis=0)
self.B /= np.sum(gamma, axis=0).reshape(-1,1) self.A /= np.sum(self.A, axis = -1, keepdims=True)
self.B /= np.sum(self.B, axis = -1, keepdims=True) # print training information
if cnt % verbos == 0:
logH = np.log(self.compute_prob(seq)[0]+self.eps)
print("Iteration num: {0} | log likelihood: {1}".format(cnt, logH)) def _gamma(self, alpha, beta):
G = np.multiply(alpha, beta)
return G / (np.sum(G, axis = -1, keepdims = True)+self.eps) def _xi(self, alpha, beta, seq):
T = len(seq)
Xi = np.zeros((T-1, self.N, self.N))
for t in range(T-1):
Xi[t] = np.multiply(np.dot(alpha[t].reshape(-1, 1),
(self.B[:, seq[t+1]]*beta[t+1]).reshape(1, -1)),
self.A)
Xi[t] /= (np.sum(Xi[t])+self.eps)
return Xi
def _random_set_patameters(self):
self.A = np.random.rand(self.N, self.N)
self.A /= np.sum(self.A, axis = -1, keepdims=True) self.B = np.random.rand(self.N, self.M)
self.B /= np.sum(self.B, axis = -1, keepdims=True) self.pi = np.random.rand(self.N)
self.pi /= np.sum(self.pi)

5. 预测

HMM的预测任务是指,根据已知的观测序列\(X\),推测出最可能的状态序列\(Z\)。近似算法为:

\[z_t^\star = \arg\max_i \gamma_t(i)
\]

近似算法的缺点是不能保证求解出来的是全局最优状态序列。

在实际中,常用的是利用动态规划的维特比算法(Viterbi)。其想法是:从节点\(x_1\)到\(x_t\)间的最优状态序列一定是全局最优状态序列上的一部分,因此可以通过不断迭代求解出全局最优状态序列。定义:

\[\delta_t(i) = P(z_t = q_i, z_{t-1}^\star, \ldots, z_1^\star, x_t, x_{t-1},\ldots, x_1)\\
\phi_t(i) = \arg\max_j \delta_{t-1}(j)a_{ji}
\]

根据定义可以得到:

\[\begin{align}
\delta_{t+1}(i) &= P(z_{t+1} = q_i, z_t^\star, \ldots, z_1^\star, x_{t+1}, x_{t},\ldots, x_1) \\
&= \max_j \delta_t(j)a_{ji}b_i(x_{t+1})\\
\phi_t(i) &= \arg\max_j \delta_{t-1}(j)a_ji
\end{align}
\]

代码如下:

def verbiti(self, obs):
''' 在给定观测序列时,计算最可能出现的状态序列 Parameters
----------
obs : numpy.ndarray
长度为T的观测序列
Returns
-------
states : numpy.ndarray
最优的状态序列
'''
T = len(obs)
states = np.zeros(T)
delta = self.pi * self.B[:, obs[0]]
states[0] = np.argmax(delta) for t in range(1, T):
tmp_delta = np.zeros_like(delta)
for i in range(self.N):
tmp_delta[i] = np.max(delta * self.A[:,i] * self.B[i , obs[t]])
states[t] = np.argmax(delta) return states

【机器学习】HMM的更多相关文章

  1. 机器学习-HMM隐马尔可夫模型-笔记

    HMM定义 1)隐马尔科夫模型 (HMM, Hidden Markov Model) 可用标注问题,在语音识别. NLP .生物信息.模式识别等领域被实践证明是有效的算法. 2)HMM 是关于时序的概 ...

  2. 机器学习——HMM & CRF

    整理自: https://blog.csdn.net/woaidapaopao/article/details/77806273?locationnum=9&fps=1 HMM CRF HMM ...

  3. 猪猪的机器学习笔记(十七)隐马尔科夫模型HMM

    隐马尔科夫模型HMM 作者:樱花猪 摘要: 本文为七月算法(julyedu.com)12月机器学习第十七次课在线笔记.隐马尔可夫模型(Hidden Markov Model,HMM)是统计模型,它用来 ...

  4. 机器学习之隐马尔科夫模型HMM(六)

    摘要 隐马尔可夫模型(Hidden Markov Model,HMM)是统计模型,它用来描述一个含有隐含未知参数的马尔科夫过程.其难点是从可观察的参数中确定该过程的隐含参数,然后利用这些参数来作进一步 ...

  5. 基于机器学习的web异常检测——基于HMM的状态序列建模,将原始数据转化为状态机表示,然后求解概率判断异常与否

    基于机器学习的web异常检测 from: https://jaq.alibaba.com/community/art/show?articleid=746 Web防火墙是信息安全的第一道防线.随着网络 ...

  6. 机器学习中的隐马尔科夫模型(HMM)详解

    机器学习中的隐马尔科夫模型(HMM)详解 在之前介绍贝叶斯网络的博文中,我们已经讨论过概率图模型(PGM)的概念了.Russell等在文献[1]中指出:"在统计学中,图模型这个术语指包含贝叶 ...

  7. 机器学习&数据挖掘笔记_25(PGM练习九:HMM用于分类)

    前言: 本次实验是用EM来学习HMM中的参数,并用学好了的HMM对一些kinect数据进行动作分类.实验内容请参考coursera课程:Probabilistic Graphical Models 中 ...

  8. 机器学习(九)隐马尔可夫模型HMM

    1.隐马尔可夫HMM模型 一个隐马尔可夫模型可以表示为\[\lambda=\{A,B,\pi\}\]具体就不说了,比较基本. 2.HMM模型的三个基本问题 1.概率计算问题:给定\(\lambda\) ...

  9. 机器学习理论基础学习13--- 隐马尔科夫模型 (HMM)

    隐含马尔可夫模型并不是俄罗斯数学家马尔可夫发明的,而是美国数学家鲍姆提出的,隐含马尔可夫模型的训练方法(鲍姆-韦尔奇算法)也是以他名字命名的.隐含马尔可夫模型一直被认为是解决大多数自然语言处理问题最为 ...

随机推荐

  1. Django modules模块

    http://www.cnblogs.com/wupeiqi/articles/5246483.html

  2. STL源码剖析-智能指针shared_ptr源码

    目录一. 引言二. 代码实现 2.1 模拟实现shared_ptr2.2 测试用例三. 潜在问题分析 你可能还需要了解模拟实现C++标准库中的auto_ptr一. 引言与auto_ptr大同小异,sh ...

  3. cmake之错误【Modules/ExternalProject.cmake:3206 (_ep_add_download_command)....】

    本文的 cmake的version是3.18 本文环境: Ubuntu 64 + CMAKE 3.18 错误描述 1.错误内容 Modules/ExternalProject.cmake:3206 ( ...

  4. 【LeetCode】1008. Construct Binary Search Tree from Preorder Traversal 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 递归 日期 题目地址:https://leetcod ...

  5. 【LeetCode】915. Partition Array into Disjoint Intervals 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/partitio ...

  6. 【LeetCode】525. Contiguous Array 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 累积和 日期 题目地址:https://leetco ...

  7. 【剑指Offer】和为S的连续正数序列 解题报告(Python)

    [剑指Offer]和为S的连续正数序列 解题报告(Python) 标签(空格分隔): 剑指Offer 题目地址:https://www.nowcoder.com/ta/coding-interview ...

  8. 使用子查询获取,使用 all 关键字获取比所有“国内短线游”价格高的线路信息,按照线路类型、线路价格升序显示线路编号、线路名和价格

    查看本章节 查看作业目录 需求说明: 使用子查询获取"国内短线游"及"国内长线游"的线路信息,按照线路类型.线路价格升序显示线路编号.线路名和价格 使用 all ...

  9. SpringBoot文章合集

    SpringBoot文章合集 SpringBoot合集为<尚硅谷雷神SpringBoot2零基础入门(spring boot2)>的学习以及项目中使用知识点进行整理. SpringBoot ...

  10. 实践剖析.NET Core如何支持Cookie和JWT混合认证、授权

    前言 为防止JWT Token被窃取,我们将Token置于Cookie中,但若与第三方对接,调用我方接口进行认证.授权此时仍需将Token置于请求头,通过实践并联系理论,我们继续开始整活!首先我们实现 ...