Metropolis–Hastings Algorithm
Let p(x) = c exp{-f(x)} be the
target probability density from which we want to simulate random draws. The
normalizing constant c may be unknown to us.
Metropolis et al. introduced the fundamental idea of evolving a Markov process
to achieve the sampling of p(x) [318]. Hastings later provided a more
general algorithm, which is described below [174].
Starting with any configuration x(0), the algorithm evolves from the
current state x(t) = x to
the next state x(t+1) as follows.
Algorithm 8.1:
[Metropolis–Hastings algorithm—Form I]
-
Propose a random perturbation of the
current state (i.e., x
x'). More
precisely, x' is generated from a proposal transition T(x(t)
x') [i.e.,
for all x], which is nearly arbitrary (of course, some are better than
others in terms of efficiency) and is completely specified by the
user.
-
Compute the Metropolis
ratio:
Equation 8.12
-
Generate a random number u ~ uniform(0,1). Let x(t+1) = x'
if u
r(x, x'), and let x(t+1)
= x(t) otherwise.
A better-known form of the Metropolis algorithm is as follows.
At each iteration:
Algorithm 8.2:
[Metropolis–Hastings algorithm—Form II]
-
A small but random perturbation of the
current configuration is made.
-
The gain (or loss) of an objective
function [corresponding to log p(x) = f(x)] resulting from this perturbation is computed.
-
A random number u ~ uniform(0,1) is generated independently.
-
The new configuration is accepted
if log(u) is
smaller than or equal to the gain and rejected otherwise.
Heuristically, this algorithm is constructed based on a
trial-and-error strategy. Metropolis et al. restricted their choice of the
perturbation function to be the symmetric ones [318]. Intuitively, this means that
there is no trend bias at the proposal stage. That is, the chance of getting
x' from perturbing x is the same as that of getting x from perturbing x'. Since any proper random perturbation can be
represented by a Markov transition function T,
the symmetry condition can be written as T(x
x') = T(x'
x).
Hastings generalized the choice of T to all those that satisfy the property T(x
x') > 0
if and only if T(x'
x) > 0 [174]. It is easy to prove that the
Metropolis–Hasting transition rule results in an "actual" transition function
A(x, x') (it
is different from T because an
acceptance/rejection step is involved) that satisfies the detailed balance
condition
Equation 8.13
which necessarily leads to a reversible Markov chain with p(x) as its
invariant distribution. Thus, the sequence x(0), x(1),... is drawn (asymptotically)
from the desired distribution.
The Metropolis algorithm has been used extensively in
statistical physics over the past four decades and is the cornerstone of all
MCMC techniques recently adopted and generalized in the statistics community.
Another class of MCMC algorithms, the Gibbs sampler [137], differs from the Metropolis
algorithm in that it uses conditional distributions based on p(x) to construct
Markov chain moves.