Optimal Prediction in Partially Observed Linear Gaussian Systems
Consider the following dynamical system
\[z_{k+1} = Az_k + fr_{k+1} + w_k \quad z_k = (p_k^{(1)} , \dot{p}^{(1)}_k, p_k^{(2)}, \dot{p}_k^{(2)}) \quad w_k \sim N(0, Q)\] \[A = \begin{bmatrix} 1 & \Delta & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & \Delta\\ 0 & 0 & 0 & 1 \end{bmatrix}\] \[f = \begin{bmatrix} \frac{\Delta^2}{2} & 0\\ \Delta & 0\\ 0 & \frac{\Delta^2}{2}\\ 0 & \Delta \end{bmatrix}\] \[r_{k+1} = \begin{bmatrix} q_k^{(1)}\\ q_k^{(2)} \end{bmatrix}\] \[y_k = Cz_k + v_k \quad v_k \sim N(0, R)\quad C = \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 \end{bmatrix} \quad z_{k+1} \sim \pi_0 = N(\hat{z}_0, \Sigma_0)\]where \(y_k, v_k \in \mathbb{R}^2\) and \(r_k=r_{k+1}=r_{k+2}...\).
In this model, the state \(z_k\) represents a ship moving in a two-dimensional Euclidean space where \(\Delta\) is the sampling interval. During each time interval \([k\Delta, (k+1)\Delta]\) the ship undergoes acceleration \(q_k^{(1)}\) and \(q_k^{(2)}\) in the \(x\) and \(y\) coordinates. \(p_t^{(1)}\) and \(\dot{p}_t^{(1)}\) denote the position and the velocity of the ship in the \(x\)-direction and \(p_t^{(2)}\) and \(\dot{p}_t^{(2)}\) denote the position and the velocity of the ship in the \(y\)-direction, i.e.,
\[\frac{d}{dt}\dot{p}_t^{(1)} = q_k^{(1)}\] \[\frac{d}{dt}\dot{p}_t^{(2)} = q_k^{(2)}\]for \(t \in [k\Delta + (k+1)\Delta]\). In discrete time for \(i=1,2\), we have:
\[\dot{p}_{k+1}^{(i)} = \dot{p}_k^{(i)} + \int_{0}^{\Delta}\frac{d \dot{p}^{(i)}_{t}}{dt}d\tau=\dot{p}_k^{(i)} + \int_{0}^{\Delta}q_k^{(i)}d\tau =\dot{p}_k^{(i)} + \left[\tau q_k^{(i)}\right]^{\Delta}_{0} = \dot{p}_k^{(i)} + \Delta q_k^{(i)}\]Similarly, for the position we obtain
\[p_{k+1}^{(i)} = p_{k}^{(i)} + \int_{0}^{\Delta}\dot{p}_k^{(i)} + \tau q_k^{(i)}d\tau = p_{k}^{(i)} + [\tau \dot{p}_k^{(i)} + \frac{\tau^2}{2} q_k^{(i)}]^{\Delta}_{0} = p_{k}^{(i)} + \Delta \dot{p}_k^{(i)} + + \frac{\Delta^2}{2} q_k^{(i)}\]for \(k=1,2,..\).
At each step \(k\), only the \(x\) and \(y\) positions of the ship are observed, i.e., the velocities are hidden.
As opposed to filtering and smoothing, the prediction problem is to predict the state of the HMM at some time \(k\) when we only know the initial state distribution \(\pi_0\). The optimal predictor is the Chapman-Kolmogorov equation, which is a recursion for computing the density \(\pi_k\) based on \(\pi_{k-1}\), i.e.,
\[\pi_k(z) = \int_{Z}p(z_k = z \mid z_{k-1})\pi_{k-1}(z_{k-1})\mathop{dz_{k-1}}\]First note that since \(\pi_0\) is Gaussian, and Gaussians are closed under linear transformations, \(\pi_1, \pi_2, ..\) will also be Gaussian. The mean of \(\pi_{k+1}\) is:
\[\mathbb{E}[x_{k+1}] = \mathbb{E}[A_kz_k + fr_{k+1} + w_k]\]Since \(w_k\) is a zero-mean Gaussian noise and \(fr_{k+1}\) is constant, this simplifies to:
\[\mathbb{E}[x_{k+1}] = \mathbb{E}[A_kz_k + fr_{k+1} + w_k]\\ = \mathbb{E}[A_kz_k] + fr_{k+1} = A_k \mathbb{E}[z_k] + fr_{k+1}=A_k \widehat{z}_k + fr_{k+1}\]Similarly the variance is obtained as:
\[\operatorname{cov}[x_{k+1}] = \mathbb{E}[(z_{k+1} - \mathbb{E}[z_{k+1}])^2]\\ =\mathbb{E}[(z_{k+1} - \widehat{z}_{k+1})^2]\\ =\mathbb{E}[(z_{k+1} - \widehat{z}_{k+1})(x_{k+1} - \widehat{z}_{k+1})^T]\\ =\mathbb{E}[z_{k+1}z_{k+1}^T - z_{k+1}\widehat{z}_{k+1}^T - \widehat{z}_{k+1}z_{k+1}^T + \widehat{z}_{k+1}\widehat{z}_{k+1}^T]\]Now substituting \(z_{k+1} = A_kz_k + fr_{k+1} + w_k\), we get:
\[\operatorname{cov}[x_{k+1}]\\ =\mathbb{E}[z_{k+1}z_{k+1}^T -z_{k+1}\widehat{z}_{k+1}^T - \widehat{z}_{k+1}z_{k+1}^T + \widehat{z}_{k+1}\widehat{z}_{k+1}^T]\\ =\mathbb{E}\Big[(A_kz_k+w_k + fr_{k+1})(A_kz_k + w_k + fr_{k+1})^T -(A_kz_k+fr_{k+1} + w_k)\widehat{z}_{k+1}^T - \\ \widehat{z}_{k+1}(A_kz_k +fr_{k+1} + w_k)^T + \widehat{z}_{k+1}\widehat{z}_{k+1}^T\Big]\\ =\mathbb{E}\Big[(A_kz_k+w_k + fr_{k+1})(z^T_kA^T_k + w^T_k + r^T_{k+1}f^T) -(A_kz_k+fr_{k+1} + w_k)\widehat{z}_{k+1}^T - \\ \widehat{z}_{k+1}(z^T_kA^T_k +r^T_{k+1}f^T + w^T_k) + \widehat{z}_{k+1}\widehat{z}_{k+1}^T\Big]\\ =\mathbb{E}\Big[A_kz_kz^T_kA^T_k + A_kz_kw^T_k + A_kz_kr^T_{k+1}f^T + w_kz^T_kA^T_k + w_kw^T_k + w_kr^T_{k+1}f^T + fr_{k+1}z^T_kA^T_k+ fr_{k+1}w^T_k \\ + fr_{k+1}r^T_{k+1}f^T -A_kz_k\widehat{z}_{k+1}^T -fr_{k+1}\widehat{z}_{k+1}^T -w_k\widehat{z}_{k+1}^T - \widehat{z}_{k+1}z^T_kA^T_k -\widehat{z}_{k+1}r^T_{k+1}f^T -\widehat{z}_{k+1}w^T_k + \widehat{z}_{k+1}\widehat{z}_{k+1}^T\Big]\]Now using \(\widehat{z}_{k+1} = A_k \widehat{z}_k + fr_{k+1}\) we get:
\[\operatorname{cov}[x_{k+1}]\\ =\mathbb{E}\Big[A_kz_kz^T_kA^T_k + A_kz_kw^T_k + A_kz_kr^T_{k+1}f^T + w_kz^T_kA^T_k + w_kw^T_k + w_kr^T_{k+1}f^T + fr_{k+1}z^T_kA^T_k+ fr_{k+1}w^T_k \\ + fr_{k+1}r^T_{k+1}f^T -A_kz_k\widehat{z}_{k+1}^T -fr_{k+1}\widehat{z}_{k+1}^T -w_k\widehat{z}_{k+1}^T - \widehat{z}_{k+1}z^T_kA^T_k -\widehat{z}_{k+1}r^T_{k+1}f^T -\widehat{z}_{k+1}w^T_k + \widehat{z}_{k+1}\widehat{z}_{k+1}^T\Big]\\ =\mathbb{E}\Big[ A_kz_kz^T_kA^T_k + A_kz_kw^T_k + A_kz_kr^T_{k+1}f^T + w_kz^T_kA^T_k + w_kw^T_k + w_kr^T_{k+1}f^T + fr_{k+1}z^T_kA^T_k+ fr_{k+1}w^T_k\\ + fr_{k+1}r^T_{k+1}f^T -A_kz_k\widehat{z}^T_kA^T_k - A_kz_kr^T_{k+1}f^T - fr_{k+1}\widehat{z}^T_k A^T_k - fr_{k+1}r^T_{k+1}f^T - w_k\widehat{z}^T_kA^T_k - w_kr^T_{k+1} - A_k \widehat{z}_kz^T_kA^T_k \\ - fr_{k+1}z^T_kA^T_k- A_k \widehat{z}_kr^T_{k+1}f^T - fr_{k+1}r^T_{k+1}f^T -A_k \widehat{z}_kw^T_k -fr_{k+1}w^T_k + \\ A_k \widehat{z}_k\widehat{z}_k^TA^T_k + A_k \widehat{z}_kr^T_{k+1}f^T + fr_{k+1}\widehat{z}_k^TA^T_k + fr_{k+1}r^T_{k+1}f^T\Big]\]Rearranging terms:
\[\operatorname{cov}[x_{k+1}]\\ =\mathbb{E}\Big[A_kz_kz^T_kA^T_k + A_kz_kw^T_k + A_kz_kr^T_{k+1}f^T + w_kz^T_kA^T_k + w_kw^T_k + w_kr^T_{k+1}f^T + fr_{k+1}z^T_kA^T_k+ fr_{k+1}w^T_k\\ + fr_{k+1}r^T_{k+1}f^T -A_kz_k\widehat{z}^T_kA^T_k - A_kz_kr^T_{k+1}f^T - fr_{k+1}\widehat{z}^T_k A^T_k - fr_{k+1}r^T_{k+1}f^T - w_k\widehat{z}^T_kA^T_k - w_kr^T_{k+1}f^T - A_k \widehat{z}_kz^T_kA^T_k \\ - fr_{k+1}z^T_kA^T_k- A_k\widehat{z}_kr^T_{k+1}f^T - fr_{k+1}r^T_{k+1}f^T -A_k \widehat{z}_kw^T_k -fr_{k+1}w^T_k + \\ A_k \widehat{z}_k\widehat{z}_k^TA^T_k + A_k \widehat{z}_kr^T_{k+1}f^T + fr_{k+1}\widehat{z}_k^TA^T_k + fr_{k+1}r^T_{k+1}f^T\Big]\\ =\mathbb{E}\Big[A_k(z_kz^T_k - z_k\widehat{z}^T_k - \widehat{z}_kz^T_k + \widehat{z}_k\widehat{z}_k^T)A^T_k + A_k(z_kw^T_k + z_kr^T_{k+1}f^T- z_kr^T_{k+1}f^T - \widehat{z}_kr^T_{k+1}f^T - \widehat{z}_kw^T_k + \widehat{z}_kr^T_{k+1}f^T)\\ + (w_kz^T_k + fr_{k+1}z^T_k - fr_{k+1}\widehat{z}^T_k - w_k\widehat{z}^T_k - fr_{k+1}z^T_k + fr_{k+1}\widehat{z}_k^T)A^T_k + w_kw^T_k + w_kr^T_{k+1}f^T\\ + fr_{k+1}w^T_k + fr_{k+1}r^T_{k+1}f^T - fr_{k+1}r^T_{k+1}f^T - w_kr^T_{k+1}f^T - fr_{k+1}r^T_{k+1}f^T -fr_{k+1}w^T_k + fr_{k+1}r^T_{k+1}f^T\Big]\\ =\mathbb{E}\Big[A_k(z_kz^T_k - z_k\widehat{z}^T_k - \widehat{z}_kz^T_k + \widehat{z}_k\widehat{z}_k^T)A^T_k + A_k(z_kw^T_k + z_kr^T_{k+1}f^T- z_kr^T_{k+1}f^T - \widehat{z}_kr^T_{k+1}f^T - \widehat{z}_kw^T_k + \widehat{z}_kr^T_{k+1}f^T)\\ + (w_kz^T_k + fr_{k+1}z^T_k - fr_{k+1}\widehat{z}^T_k - w_k\widehat{z}^T_k - fr_{k+1}z^T_k + fr_{k+1}\widehat{z}_k^T)A^T_k + w_kw^T_k \Big]\\ =A_k\mathbb{E}[((z_k - \widehat{z}_k)(z_k-\widehat{z}_k))]A^T_k + A_k\mathbb{E}[(z_k-\widehat{z}_k)w^T_k] + \mathbb{E}[w_k(z^T_k - \widehat{z}^T_k)]A^T_k + \mathbb{E}[w_kw^T_k]\\ =A_k\Sigma_kA^T_k + A_k\mathbb{E}[(z_k-\widehat{z}_k)w^T_k] + \mathbb{E}[w_k(z^T_k - w_k\widehat{z}^T_k)]A^T_k + \mathbb{E}[w_kw^T_k]\]Since \(w_k\) and \(z_k\) are independent, we have \(\mathbb{E}[z_kx_k]=\mathbb{E}[z_k]\mathbb{E}[z_k]\), and since \(\mathbb{E}[w_k] = 0\) we get:
\[\operatorname{cov}[x_{k+1}]\\ =A_k\Sigma_kA^T_k + A_k\mathbb{E}[(z_k-\widehat{z}_k)w^T_k] + \mathbb{E}[w_k(z^T_k - w_k\widehat{z}^T_k)]A^T_k + \mathbb{E}[w_kw^T_k]\\ =A_k\Sigma_kA^T_k + A_k\mathbb{E}[(z_k-\widehat{z}_k)]\mathbb{E}[w^T_k] + \mathbb{E}[w_k]\mathbb{E}[(z^T_k - w_k\widehat{z}^T_k)]A^T_k + \mathbb{E}[w_kw^T_k]\\ =A_k\Sigma_kA^T_k + \mathbb{E}[w_kw^T_k]\\ =A_k\Sigma_kA^T_k + \mathbb{E}[(w_k^T - \mathbb{E}[w_k])(w_k^T - \mathbb{E}[w_k^T])]\\ =A_k\Sigma_kA^T_k + Q\]Hence, the explicit solution to the Chapman-Kolmogorov equation for optimal prediction is \(\pi_{k+1}=N(\widehat{z}_{k+1}, \Sigma_{k+1})\) where the predicted mean \(\widehat{z}_{k+1}\) and the predicted covariance \(\Sigma_{k+1}\) satisfy the following recursions:
\[\widehat{z}_{k+1} = A_k \widehat{z}_k + fr_{k+1} \\ \Sigma_{k+1} = A_k\Sigma_kA^T_k + Q\]