Header
Home | Sitemap Set as homepage | Add to favorites
  Search the Site     » Advanced Search
Sections



Sequential Monte Carlo Signal Processing

by

image

Sequential Monte Carlo Signal Processing

8.5.1 Sequential Importance Sampling

Importance sampling is one of the most well-known and elementary Monte Carlo techniques. Suppose that we want to make an inference about some random quantity X ~ p (X) using the Monte Carlo method. Sometimes drawing samples directly from p(X) is difficult, but it may be easier (or otherwise advantageous) to draw samples from a trial density, say, q(X). Note that the desired inference can be written as

Equation 8.70

graphics/08equ070.gif


Equation 8.71

graphics/08equ071.gif


where

Equation 8.72

graphics/08equ072.gif


is called the importance weight. In importance sampling, we draw samples X(1), X(2), ..., X(n) according to the trial distribution q(·). We then approximate the inference (8.71) by

Equation 8.73

graphics/08equ073.gif


This technique is widely used, for example, for reducing the sample-size requirements in BER estimation.

However, it is usually difficult to design a good trial density function in high-dimensional problems. One of the most useful strategies in these problems is to build up the trial density sequentially. Suppose that we can decompose X as X = (x1, ..., xd), where each of the xj may be either a scalar or a vector. Then our trial density can be constructed as

Equation 8.74

graphics/08equ074.gif


by which we hope to obtain some guidance from the target density while building up the trial density. Corresponding to the decomposition of X, we can rewrite the target density as

Equation 8.75

graphics/08equ075.gif


and the importance weight as

Equation 8.76

graphics/08equ076.gif


Equation (8.76) suggests a recursive way of computing and monitoring the importance weight. That is, by denoting Xt = x1, ..., (xt) (thus, XdX), we have

Equation 8.77

graphics/08equ077.gif


Then wd (Xd) is equal to w(X) in (8.76). Potential advantages of this recursion and (8.75) are that we can stop generating further components of X if the partial weight derived from the sequentially generated partial sample is too small, and that we can take advantage of pt(xt|Xt-1) in designing qt(xt|Xt-1). In other words, the marginal distribution p(Xt) can be used to guide the generation of X.

Although the idea above sounds interesting, the trouble is that the decomposition of p(X) as in (8.75) and that of w(X) as in (8.76) are not practical at all! The reason is that in order to get (8.75), one needs to have the marginal distribution

Equation 8.78

graphics/08equ078.gif


whose computation involves integrating out components xt+1, ... xd in p(X) and is as difficult as—or even more difficult than—the original problem.

To carry out the sequential sampling idea, we need to introduce another layer of complexity. Suppose that we can find a sequence of auxiliary distributions,

graphics/481equ01.gif

so that pt(Xt) is a reasonable approximation to the marginal distribution pt(Xt) for t = 1,..., d – 1 and pd(X) = p(X). We emphasize that {pt(Xt)} are required to be known only up to a normalizing constant, and they serve only as "guides" to our construction of the entire sample X = (x1,...,xd). The sequential importance sampling (SIS) method can then be defined as noted in the following recursive procedure.

Algorithm 8.7: [Sequential importance sampling (SIS)] For t = 2,...,d:

  • Draw xt from qt(xt|Xt-1) and let Xt = (Xt-1, xt).

  • Compute

Equation 8.79

graphics/08equ079.gif


and let wt = wt-1 ut.

In the SIS step, we call ut an incremental weight. It is easy to show that Xt is properly weighted by wt with respect to pt provided that Xt-1 is properly weighted by wt-1 with respect to pt-1. Thus, the entire sample X obtained in this sequential fashion is properly weighted by the final importance weight, wd, with respect to the target density p(X). One reason for the sequential buildup of the trial density is that it breaks a difficult task into manageable pieces. The SIS framework is particularly attractive, as it can use the auxiliary distributions p1, p2,..., pd to help construct more efficient trial distributions:

  • We can build qt in light of pt. For example, one can choose (if possible)

Equation 8.80

graphics/08equ080.gif


Then the incremental weight becomes

Equation 8.81

graphics/08equ081.gif


8.5.2 SMC for Dynamical Systems

Consider the following dynamical system modeled in state-space form as

Equation 8.82

graphics/08equ082.gif


where zt, yt, ut, and vt are, respectively, the state variable, the observation, the state noise, and the observation noise at time t. These quantities can be either scalars or vectors.

Let Zt = (z0, z1,..., zt) and let Yt = (y0, y1,..., yt). Suppose an online inference of Zt is of interest; that is, at current time t we wish to make a timely estimate of a function of the state variable Zt, say h(Zt), based on the currently available observation, Yt. From Bayes' formula, we realize that the optimal solution to this problem is

Equation 8.83

graphics/08equ083.gif


In most cases an exact evaluation of this expectation is analytically intractable because of the complexity of such a dynamical system. Monte Carlo methods provide us with a viable alternative to the required computation. Specifically, if we can draw m random samples graphics/482fig01.gif from the distribution p(Zt|Yt), we can approximate E{h(Zt)|Yt} by

Equation 8.84

graphics/08equ084.gif


Very often, direct simulation from p(Zt|Yt) is not feasible, but drawing samples from some trial distribution is easy. In this case we can use the idea of importance sampling discussed above. Suppose that a set of random samples graphics/482fig01.gif is generated from the trial distribution q(Zt|Yt). By associating the weight

Equation 8.85

graphics/08equ085.gif


with the sample graphics/482fig02.gif, we can approximate the quantity of interest, E{h(Zt)|Yt}, as

Equation 8.86

graphics/08equ086.gif


with

Equation 8.87

graphics/08equ087.gif


The pair (graphics/483fig01.gif), j = 1, ..., m, is called a properly weighted sample with respect to distribution p(Zt/Yt). A trivial but important observation is that the graphics/483fig02.gif (one of the components of graphics/483fig03.gif is also properly weighted by the graphics/483fig04.gif with respect to the marginal distribution p(zt/Yt).

Another possible estimate of E{h(Zt)|Yt} is

Equation 8.88

graphics/08equ088.gif


The main reasons for preferring the ratio estimate (8.86) to the unbiased estimate (8.88) in an importance sampling framework are that the estimate (8.86) usually has a smaller mean-square error than that of (8.88), and that the normalizing constants of both the trial and target distributions are not required in using (8.86) (where these constants are canceled out); in such cases the weights {graphics/483fig04.gif} are evaluated only up to a multiplicative constant. For example, the target distribution p(Zt|Yt) in a typical dynamical system (and many Bayesian models) can be evaluated easily up to a normalizing constant (e.g., the likelihood multiplied by a prior distribution), whereas sampling from the distribution directly and evaluating the normalizing constant analytically are impossible.

To implement Monte Carlo techniques for a dynamical system, a set of random samples properly weighted with respect to p(Zt|Yt) is needed for any time t. Because the state equation in system (8.82) possesses a Markovian structure, we can implement a recursive importance sampling strategy, which is the basis of all sequential Monte Carlo techniques [277]. Suppose that a set of properly weighted samples graphics/483fig05.gif with respect to p(Zt–1|Yt–1) is given at time (t–1).

A sequential Monte Carlo filter generates from the set a new one, graphics/483fig06.gif, which is properly weighted at time t with respect to p(Zt|Yt). The algorithm is described as follows.

Algorithm 8.8: [Sequential Monte Carlo filter for dynamical systems] For j = 1,... , m:

  • Draw a sample graphics/483fig02.gif from a trial distribution q (graphics/483fig07.gif) and let graphics/483fig08.gif.

  • Compute the importance weight:

Equation 8.89

graphics/08equ089.gif


The algorithm is initialized by drawing a set of i.i.d. samples graphics/483fig10.gif from p(z0/y0). When y0 represents the "null" information, p(z|y0) corresponds to the prior distribution of z0. The samples and weights drawn by the algorithm above are characterized by the following result, the proof of which is given in the Appendix (Section 8.7.3).

Proposition 8.1: The weighted samples generated by Algorithm 8.8 satisfy

Equation 8.90

graphics/08equ090.gif


Equation 8.91

graphics/08equ091.gif


The result above, together with the law of large numbers, implies that

Equation 8.92

graphics/08equ092.gif


There are a few important issues regarding the design and implementation of a sequential Monte Carlo filter, such as the choice of the trial distribution q(·) and the use of resampling (see Section 8.5.3). Specifically, a useful choice of the trial distribution graphics/484fig01.gif, for the state-space model (8.82) is of the form

Equation 8.93

graphics/08equ093.gif


where in (8.93) we used the facts that

Equation 8.94

graphics/08equ094.gif


Equation 8.95

graphics/08equ095.gif


both of which follow directly from the state-space model (8.82). For this trial distribution, the importance weight is updated according to

Equation 8.96

graphics/08equ096.gif


Equation 8.97

graphics/08equ097.gif


where (8.96) follows from the fact that

Equation 8.98

graphics/08equ098.gif


and the last equality is due to the conditional independence property of the statespace model (8.82). See [277] for the general SMC framework and a detailed discussion on various implementation issues. SMC techniques have been employed to tackle a number of problems in wireless communications and networks, including blind detection in fading channels [76], blind detection in OFDM systems [593], and joint mobility tracking and hand-off detection in cellular networks [595].

8.5.3 Resampling Procedures

The importance sampling weight graphics/485fig01.gif measures the quality of the corresponding imputed sequence graphics/485fig02.gif. A relatively small weight implies that the sample is drawn far from the main body of the posterior distribution and has a small contribution in the final estimation. Such a sample is said to be ineffective. If there are too many ineffective samples, the Monte Carlo procedure becomes inefficient. This can be detected by observing a large coefficient of variation in the importance weight. Suppose that graphics/485fig03.gif is a sequence of importance weights. Then the coefficient of variation, nt, is defined as

Equation 8.99

graphics/08equ099.gif


with

Equation 8.100

graphics/08equ100.gif


Note that if the samples are drawn exactly from the target distribution, all the weights are equal, implying that vt = 0. A large coefficient of variation in the importance weights indicates ineffective samples. It is shown in [232] that the importance weights resulting from an SMC filter form a martingale sequence. As more and more data are processed, the coefficient of variation of the weights increases—that is, the number of ineffective samples increases—rapidly.

A useful method for reducing ineffective samples and enhancing effective ones is resampling, which was suggested in [160, 276] under the SMC setting. Roughly speaking, resampling allows those "bad" samples (with small importance weights) to be discarded and those "good" ones (with large importance weights) to replicate so as to accommodate the dynamic change of the system. Specifically, let graphics/486fig01.gif be the original properly weighted samples at time t. A residual resampling strategy forms a new set of weighted samples graphics/486fig02.gif according to the following algorithm (assume that graphics/486fig03.gif).

Algorithm 8.9: [Residual resampling]

  • For j = 1,..., m, retain graphics/486fig04.gif copies of the sample (graphics/486fig05.gif). Denote graphics/486fig06.gif.

  • Obtain Kr i.i.d. draws from the original sample set graphics/486fig07.gif, with probabilities proportional to (graphics/486fig08.gif), j = 1,..., m.

  • Assign equal weight (i.e., set graphics/486fig09.gif) to each new sample.

The correctness of the residual resampling procedure above is stated by the following result, whose proof is given in the Appendix (Section 8.7.4).

Proposition 8.2: The samples drawn by the residual resampling procedure (Algorithm 8.9) are properly weighted with respect to p(Zt|Yt) for m

Alternatively, we can use the following simple resampling procedure, which also produces properly weighted samples with respect to p(Zt|Yt):

  • For j = 1,..., m, draw i.i.d. random numbers lj from the set {1,2,..., m}, with probability distribution

Equation 8.101

graphics/08equ101.gif


  • The m new samples and the corresponding weights graphics/486fig10.gif are given by

Equation 8.102

graphics/08equ102.gif


In practice, when small to modest m is used (we used m = 50 in our simulations), the resampling procedure can be seen as trading off between bias and variance. That is, the new samples with their weights resulting from the resampling procedure are only approximately proper, which introduces small bias in Monte Carlo estimation. On the other hand, however, resampling greatly reduces Monte Carlo variance for the future samples.

Resampling can be done at any time. However, resampling too often adds computational burden and decreases "diversities" of the Monte Carlo filter (i.e., it decreases the number of distinctive filters and loses information). On the other hand, resampling too rarely may result in a loss of efficiency. It is thus desirable to give guidance on when to do resampling. A measure of the efficiency of an importance sampling scheme is the effective sample size graphics/487fig01.gif, defined as

Equation 8.103

graphics/08equ103.gif


Heuristically, graphics/487fig01.gif reflects the equivalent size of a set of i.i.d. samples for the set of m weighted ones. It is suggested in [277] that resampling should be performed when the effective sample size becomes small (e.g., graphics/487fig02.gif). Alternatively, one can conduct resampling at every fixed-length time interval (say, every five steps).

8.5.4 Mixture Kalman Filter

Many dynamical system models belong to the class of conditional dynamical linear models (CDLMs) of the form

Equation 8.104

graphics/08equ104.gif


where ut ~ Nc(0,I), vt ~ Nc(0,I), and lt is a random indicator variable. The matrices Flt, Glt, Hlt, and Klt are known, given lt. In this model, the state variable zt corresponds to (xt,lt).

We observe that for a given trajectory of the indicator lt in a CDLM, the system is both linear and Gaussian, for which the Kalman filter provides the complete statistical characterization of the system dynamics. Recently, a novel SMC method, the mixture Kalman filter (MKF), was developed in [75] for online filtering and prediction of CDLMs; it exploits the conditional Gaussian property and uses a marginalization operation to improve the algorithmic efficiency. Instead of dealing with both xt and lt, the MKF draws Monte Carlo samples only in the indicator space and uses a mixture of Gaussian distributions to approximate the target distribution. Compared with the generic SMC method, the MKF is substantially more efficient (e.g., it gives more accurate results with the same computing resources). However, the MKF often needs more computational power for its proper implementation, as the formulas required are more complicated. Additionally, the MKF requires the CDLM structure, which is not applicable to all problems of interest.

Let Yt = (y0, y1,..., yt) and let Lt = (l0, l1,..., lt). By recursively generating a set of properly weighted random samples graphics/487fig03.gif to represent p(Lt|Yt), the MKF approximates the target distribution p(xt/Yt) by a random mixture of Gaussian distributions

Equation 8.105

graphics/08equ105.gif


where graphics/488fig01.gif is obtained by implementing a Kalman filter for the given indicator trajectory graphics/488fig02.gif. Thus, a key step in the MKF is the production at time t of the weighted samples of indicators, graphics/488fig03.gif, based on the set of samples, graphics/488fig04.gif, at the previous time (t - 1) according to the following algorithm.

Algorithm 8.10: [Mixture Kalman filter for conditional dynamical linear models] For j = 1,...m:

  • Draw a sample graphics/488fig05.gif from a trial distribution q (graphics/488fig06.gif).

  • Run a one-step Kalman filter based on graphics/488fig05.gif,graphics/488fig07.gif, and yt to obtain graphics/488fig08.gif.

  • Compute the weight:

Equation 8.106

graphics/08equ106.gif


The MKF can be extended to handle the partial CDLM, where the state variable has a linear component and a nonlinear component. See [75] for a detailed treatment of the MKF and the extended MKF.


169 times read

Related news

» Antenna Characteristics
by admin posted on Jul 15,2007
» Origins of VoIP
by admin posted on Aug 23,2007
» Basis-of-Voice-Coding
by admin posted on Oct 24,2006
» Grading and Selecting Technologies
by admin posted on May 20,2007
» Intelligent Paging Scheme
by admin posted on Oct 24,2006


More Top News
Cisco Wireless Networking
Most Popular
Featured Author