Sequential EM Algorithm
The EM algorithm discussed above is a batch algorithm. We next
briefly describe a sequential version of the EM algorithm [482, 563]. Suppose that y1, y2, ... is a sequence of observations
with marginal pdf f(y|q), where q
Cm is a static parameter vector. A class of
sequential estimators derived from the maximum-likelihood principle is given
by
Equation 9.42
where q(i) is the
estimate of q at
the ith step, P(yi+1,q(i)) is an
m x m matrix
defined later in this section, and
Equation 9.43
is the update score (i.e., the gradient of the log-likelihood
function). Let H(yi,
q(i)) denote the Hessian matrix of log f(yi|q(i)):
Equation 9.44
Let xi denote a "complete" data set related to yi for
i = 1,2, .... The complete data set xi is
selected in the (sequential) EM algorithm such that yi can
be obtained through a many-to-one mapping xi
yi, and so that its knowledge makes the estimation
problem easy [e.g., the conditional density f(xi|q) can easily be obtained]. Denote the Fisher
information matrices of the data yi and
xi, respectively, as
Different versions of sequential estimation algorithms are
characterized by different choices of the function P(yi+1,q(i)) in (9.42), as follows.
-
The sequential EM algorithm:
Equation 9.45
The consistency and asymptotic normality of this algorithm are
considered in [482].
Applications of the sequential EM algorithm to communications and signal
processing problems are reported in [123, 238, 239, 563, 601, 602].
-
The Newton–Raphson
algorithm:
Equation 9.46
-
A stochastic approximation
procedure:
Equation 9.47
Note that for i.i.d. observations {yi}, if
i in (9.47) is replaced by (i +
1), we obtain the maximum-likelihood estimator of q for exponential families [482]. The asymptotic distribution of
this procedure can be found in [115, 428].
-
If P(yi+1,q(i)) is a
constant diagonal matrix with small elements, then (9.42) is the conventional steepest-descent algorithm. Some
related choices of P(yi+1,q(i)) are
suggested in [482].
-
For time-varying parameters
{q(i)}, a conventional approach suggested in [123, 283] is to substitute the converging
series 1/i in (9.45) with a small positive constant l0. The new estimator is given by
Equation 9.48
where
(i) is
the estimate of q(i).