11.2 Rank

There is one general concept which plays a fundamental role in many decompositions: the rank. If \(\mathbf{X}\) is a \(n \times p\) matrix, then \(\mathbf{X}\) is of full column rank if \(\mathbf{Xb} = \mathbf{0}\) only if \(\mathbf{b} = \mathbf{0}\). We also say that the columns of \(\mathbf{X}\) are linearly independent. Full row rank is defined in a similar way.

If \(\mathbf{X}\) can be written as the product \(\mathbf{AB}\), with \(\mathbf{A}\) an \(n \times k\) matrix of full column rank and \(\mathbf{B}\) an \(k \times p\) matrix of full row rank, then \(\mathbf{X}\) is said to have rank \(k\). The decomposition \(\mathbf{X} = \mathbf{AB}\) is a full rank decomposition.

The rank of a matrix is the minimum number of row or column vectors needed to generate the rows or columns of the matrix exactly through linear combinations. Geometrically, this algebraic concept is equivalent to the dimensionality of the matrix. In practice, however, no large matrix is of low rank, but we can approximate it “optimally” by a matrix of low rank and then view this approximate matrix in a low-dimensional space.

Suppose that \(\mathbf{X}\) is an \(n \times p\) matrix with rank \(r\). Then the idea of matrix approximation is to find another \(n \times p\) matrix \(\mathbf{\hat{X}}\) of lower rank \(k < r\) that resembles \(\mathbf{X}\) “as closely as” possible. Closeness can be measured in any reasonable way, but least-squares approximation makes the solution of the problem particularly simple. Hence we want to find a matrix \(\mathbf{\hat{X}}\) that minimizes the following objective function over all possible rank \(k\) matrices:

\[ trace [ (\mathbf{X} - \mathbf{\hat{X}} ) (\mathbf{X} - \mathbf{\hat{X}} )^\mathsf{T} ] \]