1. Gibbs Inequality
自然对数满足
\begin{align} & \ln x \leq x - 1 \\ &s.t. \quad x>0 \end{align}
仅当时等式成立
有两个概率分布 \begin{align} P = \{p_1,p_2,…,p_n\} \\ Q = \{q_1,q_2,…,q_n\} \end{align}
结合上面的不等式,有
\begin{align} -\sum_{i=1}^n p_i\ln \frac{q_i}{p_i} &\geq -\sum_{i=1}^np_i(\frac{q_i}{p_i}-1) \\ &= \sum_{i=1}^n(p_i - q_i) \\ &= 0 \end{align}
整理上式得到最终的Gibbs Inequality
\begin{align} -\sum_{i=1}^n p_i\ln p_i \leq -\sum_{i=1}^n p_i\ln q_i \end{align}
2. Description of EM
存在一个统计模型,其参数为,可以生成可观测到的数据集合,以及不可观测的数据集合,其的似然函数为
\begin{align} L(\theta;X) = p(X|\theta) = \sum_Zp(X,Z|\theta) \end{align}
当的取值范围很大时,上式的计算代价非常大,因此有迭代的EM方法
Expectation Step
给定当前的的估计值,给定X,计算当前对数似然函数的期望 \begin{align} Q(\theta|\theta^{(t)}) = E_{Z|X,\theta^{(t)}}L(\theta;X,Z) = \sum_Zp(Z|X,\theta^{(t)})\log p(X,Z|\theta) \end{align}
Maximization Step
\begin{align} \theta^{(t+1)}= \mathop{argmax}_{\theta}Q(\theta|\theta^{(t)}) \end{align}
3. Proof
根据贝叶斯定理 \begin{align} \log p(X|\theta)=\log p(X,Z|\theta) - \log p(Z|X,\theta) \end{align}
等式两边同时乘以,并对求和
\begin{align} \log p(X|\theta)&=\sum_Zp(Z|X,\theta^{(t)})\log p(X,Z|\theta) - \sum_Zp(Z|X,\theta^{(t)})\log p(Z|X,\theta) \\ &= Q(\theta|\theta^{(t)})- \sum_Zp(Z|X,\theta^{(t)})\log p(Z|X,\theta) \end{align}
为了符号简洁,定义
\begin{align} H(\theta|\theta^{(t)}) = - \sum_Zp(Z|X,\theta^{(t)})\log p(Z|X,\theta) \end{align}
于是
\begin{align} \log p(X|\theta) = Q(\theta|\theta^{(t)}) + H(\theta|\theta^{(t)}) \end{align}
由于
\begin{align} diff &=\log p(X|\theta^{(t+1)}) - \log p(X|\theta^{(t)}) \\ &= Q(\theta^{(t+1)}|\theta^{(t)})-Q(\theta^{(t)}|\theta^{(t)}) + H(\theta^{(t+1)}|\theta^{(t)})-H(\theta^{(t)}|\theta^{(t)}) \\ &= Q_{diff} + H_{diff} \end{align}
由于的定义 \begin{align} Q_{diff}&=Q(\theta^{(t+1)}|\theta^{(t)})-Q(\theta^{(t)}|\theta^{(t)}) \geq 0 \end{align}
由于Gibbs Inequality
\begin{align} H_{diff} &= H(\theta^{(t+1)}|\theta^{(t)})-H(\theta^{(t)}|\theta^{(t)}) \\ &= - \sum_Zp(Z|X,\theta^{(t)})\log p(Z|X,\theta^{t+1}) + \sum_Zp(Z|X,\theta^{(t)})\log p(Z|X,\theta^{(t)}) \\ &\geq 0 \end{align}
因此
\begin{align} \log p(X|\theta^{(t+1)}) \geq \log p(X|\theta^{(t)}) \end{align}
至此,EM算法的正确性得以证明