QR-RLS Blind Linear MMSE Detector
Assume that Cr[i] is positive
definite. Let
Equation 2.54
be the Cholesky decomposition (i.e., C[i] is the unique upper triangular Cholesky factor with
positive diagonal elements). Define the following quantities:
Equation 2.55
Equation 2.56
Equation 2.57
At time i, the a posteriori least-squares (LS) estimate is given by
Equation 2.58
Equation 2.59
The a priori LS estimate at time
i is given by
Equation 2.60
It can be shown that x[i] and z[i] are related by [389]
Equation 2.61
Suppose that C[i – 1] and u[i – 1] are available from the previous recursion. At
time i the new observation r[i] becomes
available. We construct a block matrix consisting of C[i – 1], u[i – 1], and r[i] and apply an
orthogonal transformation as follows:
Equation 2.62
In (2.62) the matrix
Q[i] which zeros the
first N elements on the last row of the
partitioned matrix appearing on the left-hand side of (2.62), is an orthogonal matrix consisting of N Givens rotations,
Equation 2.63
where Qn[i] zeros the nth element
in the last row by rotating it with the (n + 1)th
row. An individual rotation is specified by two scalars, cn and sn (which can be regarded as the cosine and
sine, respectively, of a rotation angle fn), and affects only the last row and the
(n + 1)th row. The effects on these two rows
are
Equation 2.64
where the rotation factors are defined by
Equation 2.65
Equation 2.66
The correctness of (2.62) is shown in the Appendix (Section 2.8.1). It is seen
from (2.62) that the computed quantities
appearing on the right-hand side are C[i], u[i], v[i], at time n. It is also shown in the Appendix (Section 2.8.1) that the
quantities a[i], z[i], and x[i] can be updated
according to the equations
Equation 2.67
Equation 2.68
Equation 2.69
Note that g[i] in (2.62) is the last diagonal element of Q[i]. A direct calculation shows that [175, 316].
The initialization of the QR-RLS blind adaptive algorithm is
given by , and a[-1] = d, where d is a small number. This corresponds to the initial
condition Cr[-1] = dIN and m1[-1] = s1 (i.e., the adaptation starts with
the matched filter). At each time i, the
algorithm proceeds as follows.
Algorithm 2.4: [QR-RLS blind
linear MMSE detector—synchronous CDMA]
Equation 2.70
Equation 2.71
The orthogonal transformation (2.62) on the block matrix can be mapped onto a triangular
systolic array for highly efficient parallel implementation, which is discussed
next.
Parallel Implementation on Systolic Arrays
The QR-RLS blind adaptive algorithm derived above has good
numerical properties and is well suited for parallel implementation. Figure 2.1 shows systematically a systolic
array implementation of this algorithm, using a triangular array first proposed
in [316]. It
consists of three sections: the basic upper triangular array, which stores and
updates C[i]; the right-hand column of cells, which stores and
updates u[i]; and the final processing cell, which computes the
demodulated data bit. The system is initialized as and .
The received data r[i] are fed from the top and propagate to the bottom of
the array. The rotation angles fn are calculated in left boundary cells
and propagate from left to right. The internal cells update their elements by
Givens rotations using the angles received from the left. The factor g[i] is calculated along the left boundary cells, where a
dot (•) represents an extra delay. The final cell
extracts the signs of h[i] and g[i] and produces the demodulated differential data bit
according to (2.71). The computation at
each cell is also outlined in Fig. 2.1.
The QR-RLS algorithm may also be carried out using the square-root free Givens
rotation algorithm to reduce the computational complexity at each cell [158, 316]. For more details on the
systolic array implementations, see [175, 316].
The systolic array in Fig.
2.1 operates in a highly pipelined manner. The computational wavefront
propagates at the received data symbol rate. The demodulated data bits are also
output at the received data symbol rate. Note that the demodulated data bit
produced on a given clock corresponds to the received vector entered 2N clock cycles earlier.
If multiple synchronous user data streams need to be
demodulated, we can simply add more column arrays on the right-hand side and
initialize each of them by the corresponding signature vector of each user. It
is clear that by using the same triangular array, multiple users' data can be
demodulated simultaneously. This is also illustrated in Fig. 2.1 for the case of two users. (Also, multiple paths
of the same signal can be handled by adding appropriate linear arrays to Fig. 2.1.)
|