MLPack之HMM学习

[导读]这个开源的机器学习的库使用到了boost中的很多库，包括tokenizer（分词器）；也用到了Armadillo也地方，这是一个基于c++语言开发的线性代数开发库，主要用于矩阵计算。一、目录结构：其中

这个开源的机器学习的库使用到了boost中的很多库，包括tokenizer（分词器）；也用到了Armadillo也地方，这是一个基于c++语言开发的线性代数开发库，主要用于矩阵计算。

一、目录结构：
其中 hmm_generate_main.cpp：随机的产生一个观测序列然后得到相应的隐藏的状态序列。 hmm_loglik_main.cpp：对于给定的观测序列计算其在给定的HMM参数下的loglikehood。 hmm_train_main.cpp：学习/训练HMM模型的参数，有两种方案，一种是有标记序列的监督式训练；一种是没有标记的非监督式训练，使用的Baum-Welch算法。 hm_viterbi_main.cpp：使用viterbi算法计算给定HMM模型参数和观测序列的情况下最有可能的隐藏状态序列。
二、算法：

1、Forward_Backward：

/**
 * The Forward procedure (part of the Forward-Backward algorithm).
 */
templatevoid HMM::Forward(const arma::mat& dataSeq,
                                arma::vec& scales,
                                arma::mat& forwardProb) const
{
  // Our goal is to calculate the forward probabilities:
  //  P(X_k | o_{1:k}) for all possible states X_k, for each time point k.
  forwardProb.zeros(transition.n_rows, dataSeq.n_cols);
  scales.zeros(dataSeq.n_cols);

  // Starting state (at t = -1) is assumed to be state 0.  This is what MATLAB
  // does in their hmmdecode() function, so we will emulate that behavior.
  for (size_t state = 0; state < transition.n_rows; state++)		//初始化时刻0时各个状态的概率
    forwardProb(state, 0) = transition(state, 0) *
        emission[state].Probability(dataSeq.unsafe_col(0));

  // Then normalize the column.
  scales[0] = accu(forwardProb.col(0));
  forwardProb.col(0) /= scales[0];

  // Now compute the probabilities for each successive observation.
  for (size_t t = 1; t < dataSeq.n_cols; t++)
  {
    for (size_t j = 0; j < transition.n_rows; j++)
    {
      // The forward probability of state j at time t is the sum over all states
      // of the probability of the previous state transitioning to the current
      // state and emitting the given observation.
      forwardProb(j, t) = accu(forwardProb.col(t - 1) %			//计算时刻t状态为观测序列为o1,o2,o3,...,ot且状态为j的前向概率
          trans(transition.row(j))) *					//所使用的公式可参考《统计学习方法》公式10.16
          emission[j].Probability(dataSeq.unsafe_col(t));		//emission表示的是观测概率分布
    }

    // Normalize probability.
    scales[t] = accu(forwardProb.col(t));				//归一化处理
    forwardProb.col(t) /= scales[t];
  }
}

templatevoid HMM::Backward(const arma::mat& dataSeq,
                                 const arma::vec& scales,
                                 arma::mat& backwardProb) const
{
  // Our goal is to calculate the backward probabilities:
  //  P(X_k | o_{k + 1:T}) for all possible states X_k, for each time point k.
  backwardProb.zeros(transition.n_rows, dataSeq.n_cols);

  // The last element probability is 1.
  backwardProb.col(dataSeq.n_cols - 1).fill(1);

  // Now step backwards through all other observations.
  for (size_t t = dataSeq.n_cols - 2; t + 1 > 0; t--)
  {
    for (size_t j = 0; j < transition.n_rows; j++)
    {
      // The backward probability of state j at time t is the sum over all state
      // of the probability of the next state having been a transition from the
      // current state multiplied by the probability of each of those states
      // emitting the given observation.
      for (size_t state = 0; state < transition.n_rows; state++)				
        backwardProb(j, t) += transition(state, j) * backwardProb(state, t + 1)//计算t时刻状态为j的前提下从t+1到T的部分观测序列为
            * emission[state].Probability(dataSeq.unsafe_col(t + 1));			//o(t+1),o(t+2),...,o(T)的后向概率
																				//可参见《统计学习方法》公式10.20		
      // Normalize by the weights from the forward algorithm.
      backwardProb(j, t) /= scales[t + 1];
    }
  }
}

2、Buam-Welch：其中transition表示状态转移概率分布矩阵，smission表示观测概率分布矩阵；dimensionality表示观测序列的维数；tolerance表示Baum-Welch表示迭代收敛时的差值。所使用的算法来自于Hidden Markov Models: Estimation and Control这篇论文。训练的过程：
step1：E-step：

      // Add the log-likelihood of this sequence.  This is the E-step.
      loglik += Estimate(dataSeq[seq], stateProb, forward, backward, scales);		//当完全数据的对数似然函数逐渐收敛时，
											//loglik的值增大会减小

先给定观测序列矩阵dataSeq[seq]，然后使用Estimate函数计算观测序列和隐藏状态序列的对数似然函数值，并累加到loglik中（判断算法是否结束使用）；同时计算出前向和后向概率分布矩阵分别为forward和backward；stateProb表示在某一时刻t处于某个状态j的概率；scales表示比例因子。
step2：M-step：

      // Now re-estimate the parameters.  This is the M-step.
      //   T_ij = sum_d ((1 / P(seq[d])) sum_t (f(i, t) T_ij E_i(seq[d][t]) b(i,
      //           t + 1)))
      //   E_ij = sum_d ((1 / P(seq[d])) sum_{t | seq[d][t] = j} f(i, t) b(i, t)
      // We store the new estimates in a different matrix.
      for (size_t t = 0; t < dataSeq[seq].n_cols; t++)
      {
        for (size_t j = 0; j < transition.n_cols; j++)
        {
          if (t < dataSeq[seq].n_cols - 1)
          {
            // Estimate of T_ij (probability of transition from state j to state
            // i).  We postpone multiplication of the old T_ij until later.
            for (size_t i = 0; i < transition.n_rows; i++)
              newTransition(i, j) += forward(j, t) * backward(i, t + 1) *			
                  emission[i].Probability(dataSeq[seq].unsafe_col(t + 1)) /
                  scales[t + 1];					//计算新的转移概率分布矩阵，使用了《统计学习方法》
          }																//中的10.39公式，不过这里推迟了乘法的计算

          // Add to list of emission observations, for Distribution::Estimate().
          emissionList.col(sumTime) = dataSeq[seq].col(t);		//所有的观测数据以列为单位都存入emissionList的col中
          emissionProb[j][sumTime] = stateProb(j, t);			//每个状态下的所有观测数据的概率值，不同观测
        }																//下的概率值不同
        sumTime++;
      }
    }

    // Assign the new transition matrix.  We use %= (element-wise
    // multiplication) because every element of the new transition matrix must
    // still be multiplied by the old elements (this is the multiplication we
    // earlier postponed).
    transition %= newTransition;

最终得到的transition即为状态转移矩阵，当然还要归一化和多次迭代。而观测概率分布在哪里呢？

    // Now estimate emission probabilities.
    for (size_t state = 0; state < transition.n_cols; state++)
      emission[state].Estimate(emissionList, emissionProb[state]);	//计算出每一个状态下的观测数据的分布，
									//就像计算每一个盒子内部不同颜色的球概率分布一样

emission就是要求的观测概率分布，即在这里求得不同状态state下数据的分布。 3、Viterbi：

  // The calculation of the first state is slightly different; the probability
  // of the first state being state j is the maximum probability that the state
  // came to be j from another state.
  logStateProb.col(0).zeros();													//初始化最初时刻，即t=0时各种状态state的概率
  for (size_t state = 0; state < transition.n_rows; state++)
    logStateProb[state] = log(transition(state, 0) *
        emission[state].Probability(dataSeq.unsafe_col(0)));

  // Store the best first state.
  arma::uword index;
  logStateProb.unsafe_col(0).max(index);
  stateSeq[0] = index;

  for (size_t t = 1; t < dataSeq.n_cols; t++)
  {
    // Assemble the state probability for this element.
    // Given that we are in state j, we state with the highest probability of
    // being the previous state.
    for (size_t j = 0; j < transition.n_rows; j++)
    {
      arma::vec prob = logStateProb.col(t - 1) + logTrans.col(j);//这里使用的是《统计学习方法》中的公式10.45，在这里对其求对数
      logStateProb(j, t) = prob.max() +				//计算t时刻状态为j时的所有单个路径概率最大值
          log(emission[j].Probability(dataSeq.unsafe_col(t)));
    }

    // Store the best state.                                     
    logStateProb.unsafe_col(t).max(index);			//总感觉这一点不正确，最终的状态序列应该回溯才能够知道；第一遍
    stateSeq[t] = index;					//计算的时候还不知道哪一个才是最终的终点使得整条状态序列出现的
  }								//概率最大，所以也就无法在这时候确定哪个状态是最优的
																
  return logStateProb(stateSeq(dataSeq.n_cols - 1), dataSeq.n_cols - 1);

感觉在最终计算stateSeq（用于返回最优状态序列）时感觉不对，有两点：1、在计算出来prob之后就应该从prob中获得t时刻状态为j的最优状态序列；2、最终的stateSeq需要回溯才能够得到，因为第一遍计算的时候还无法确定哪一个状态才是最终的最优状态序列要经过的，所以应该计算出来最优的最后一个状态之后回溯才能得到最优序列。