Gibbs Sampler
Let X = (x1,...,xd),
where xi is either a scalar or a vector. Suppose that we
want to draw samples of X from an
underlying density p(X). In the Gibbs sampler, one randomly or
systematically chooses a coordinate, say xi, and
then updates its value with a new sample xi'
drawn from the conditional distribution p(xi|X[-i]),
where we define
. Algorithmically, the Gibbs sampler
can be implemented as follows:
Algorithm 8.3: [Random-scan
Gibbs sampler] Suppose that the current sample is
. Then:
-
Select i randomly from the index
set {1,...,d} according to a given probability vector (p1,...,pd).
-
Draw
from the conditional distribution p(
), and
let
.
Algorithm 8.4:
[Systematic-scan Gibbs sampler] Let the current sample
be
. For i = 1,...,d, draw
from the conditional distribution
Equation 8.14
It is easy to check that every
individual conditional update leaves p(·)
invariant. Suppose that currently
. Then
follows
its marginal distribution under p(·). Thus,
Equation 8.15
implying that the joint distribution of (
) is
unchanged at p(·) after one update.
Under regularity conditions, in the steady state, the sequence
of vectors ..., X(t-1), X(t),
X(t+1),... is a realization of a homogeneous Markov
chain with the transition kernel from state X' to state X given by
Equation 8.16
The convergence behavior of the Gibbs sampler is investigated
in [71, 134, 137, 278, 436, 473] and general conditions are given
for the following two results:
-
The distribution of X(t)
converges geometrically to p(X), as t 
.
-
, for any integrable function f.
The Gibbs sampler requires an initial transient period to
converge to equilibrium. An initial period of length t0 is known as the burning-in period, and
the first t0 samples should always be
discarded. Convergence is usually detected in some ad hoc way; some methods for
this are found in [472]. One such method is to monitor a
sequence of weights that measure the discrepancy between the sampled and desired
distribution [472].
The samples generated by the Gibbs sampler are not independent, hence care needs
to be taken in assessing the accuracy of such estimators. By grouping variables
together (i.e., drawing samples of several elements of X simultaneously), one can usually accelerate the
convergence and generate less-correlated data [274, 278]. To reduce the dependence
between samples, one can extract every rth sample
to be used in the estimation procedure. When r is
large, this approach generates almost independent
samples.
Other Techniques
A main problem with all the MCMC algorithms is that they may,
for various reasons, move very slowly in the configuration space or may become
trapped in a local mode. This phenomenon is generally called slow mixing of the chain. When a chain is slow mixing,
estimation based on the resulting Monte Carlo samples becomes very inaccurate.
Some recent techniques suitable for designing more efficient MCMC samplers
include parallel tempering [141], the multiple-try method [279], and evolutionary
Monte Carlo [264].