13.1 SVD Basics
The Singular Value Decomposition expresses any matrix, such as an \(n \times p\) matrix \(\mathbf{X}\), as the product of three other matrices:
\[ \mathbf{X = U D V^\mathsf{T}} \]
where:
\(\mathbf{U}\) is a \(n \times p\) column orthonormal matrix containing the left singular vectors.
\(\mathbf{D}\) is a \(p \times p\) diagonal matrix containing the singular values of \(\mathbf{X}\).
\(\mathbf{V}\) is a \(p \times p\) column orthonormal matrix containing the right singular vectors.
In terms of the shapes of the matrices, the SVD decomposition has this form:
\[ \begin{bmatrix} & & \\ & & \\ & \mathbf{X} & \\ & & \\ & & \\ \end{bmatrix} = \ \begin{bmatrix} & & \\ & & \\ & \mathbf{U} & \\ & & \\ & & \\ \end{bmatrix} \ \begin{bmatrix} & & \\ & \mathbf{D} & \\ & & \\ \end{bmatrix} \ \begin{bmatrix} & & \\ & \mathbf{V}^\mathsf{T} & \\ & & \\ \end{bmatrix} \]
The SVD says that we can factorize \(\mathbf{X}\) with the product of an orthonormal matrix \(\mathbf{U}\), a diagonal matrix \(\mathbf{D}\), and an orthonormal matrix \(\mathbf{V}\).
\[ \mathbf{X} = \ \begin{bmatrix} u_{11} & \cdots & u_{1p} \\ u_{21} & \cdots & u_{2p} \\ \vdots & \ddots & \vdots \\ u_{n1} & \cdots & u_{np} \\ \end{bmatrix} \ \begin{bmatrix} l_{1} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & l_{p} \\ \end{bmatrix} \ \begin{bmatrix} v_{11} & \cdots & v_{p1} \\ \vdots & \ddots & \vdots \\ v_{1p} & \cdots & v_{pp} \\ \end{bmatrix} \]
13.1.1 SVD Properties
You can think of the SVD structure as the basic structure of a matrix. What does this mean? Well, to understand the meaning of basic structure, we need to say more things about what each of the matrices \(\mathbf{U}\), \(\mathbf{D}\), and \(\mathbf{V}\) represent. To be more precise:
About \(\mathbf{U}\)
The matrix \(\mathbf{U}\) is the orthonormalized matrix which is the most basic component. It’s like the skeleton of the matrix.
- \(\mathbf{U}\) is unitary, and its columns form a basis for the space spanned by the columns of \(\mathbf{X}\).
\[ \mathbf{U^\mathsf{T} U} = \mathbf{I}_{p} \]
- \(\mathbf{U}\) cannot be orthogonal (\(\mathbf{U U^\mathsf{T} = I_n}\)) unless \(r = p\)
About \(\mathbf{V}\)
The matrix \(\mathbf{V}\) is the orientation or correlational component.
- \(\mathbf{V}\) is unitary, and its columns form a basis for the space spanned by the rows of \(\mathbf{X}\).
\[ \mathbf{V^\mathsf{T} V} = \mathbf{I}_{p} \]
- \(\mathbf{V}\) cannot be orthogonal (\(\mathbf{V V^\mathsf{T} = I_p}\)) unless \(r = p = m\)
About \(\mathbf{D}\)
The matrix \(\mathbf{D}\) is referred to as the spectrum and it is a scale component. Note that all the values in the diagonal of \(\mathbf{D}\) are non-negative numbers.
This matrix is also unique. It is a like a fingerprint of a matrix. It is assumed that the singular values are ordered from largest to smallest.
All elements of \(\mathbf{D}\) can be taken to be positive, ordered from large to small (with ties allowed).
\(\mathbf{D}\) has non-negative real numbers on the diagonal (assuming \(\mathbf{X}\) is real).
The rank of \(\mathbf{X}\) is given by \(r\), the number of such positive values (which are called singular values). Furthermore, \(r(\mathbf{X}) \leq min(n,p)\).
13.1.2 Diagrams of SVD
Under the standard convention that \(n > p\), if we assume that \(\mathbf{X}\) is of full-column rank \(r(\mathbf{X}) = p\), then we can display the decomposition with the following diagram:

Figure 13.1: SVD Decomposition Diagram
In general, when the decomposed matrix \(\mathbf{X}\) is not of full column-rank, that is \(rank(\mathbf{X}) = r < p\), then the diagram of SVD could be depicted as follows

Figure 13.2: SVD Decomposition Diagram
13.1.3 Example
Here’s an example of SVD in R, via the function svd()
. First let’s create a matrix \(\mathbf{X}\) with random numbers:
# X matrix
set.seed(22)
X <- matrix(rnorm(20), 5, 4)
X
[,1] [,2] [,3] [,4]
[1,] -0.512 1.858 -0.764 -0.922
[2,] 2.485 -0.066 0.082 0.862
[3,] 1.008 -0.163 0.743 2.003
[4,] 0.293 -0.200 -0.084 0.937
[5,] -0.209 0.301 -0.793 -1.616
R comes with the function svd()
; it’s output is a list with three elements:
# singular value decomposition
SVD <- svd(X)
# elements returned by svd()
names(SVD)
[1] "d" "u" "v"
d
is a vector containing the singular values (i.e. values in the diagonal of \(\mathbf{D}\))u
is the matrix of left singular values.v
is the matrix of right singular values.
# vector of singular values
(d <- SVD$d)
[1] 3.952 2.022 1.475 0.432
# matrix of left singular vectors
(U = SVD$u)
[,1] [,2] [,3] [,4]
[1,] -0.425 -0.5391 -0.723 0.00979
[2,] 0.527 -0.7686 0.286 0.05610
[3,] 0.575 0.0500 -0.442 0.13107
[4,] 0.222 0.0527 -0.170 -0.95123
[5,] -0.402 -0.3366 0.413 -0.27337
# matrix of right singular vectors
(V = SVD$v)
[,1] [,2] [,3] [,4]
[1,] 0.571 -0.741 0.3386 0.104
[2,] -0.274 -0.530 -0.7680 0.234
[3,] 0.277 0.321 -0.0446 0.905
[4,] 0.723 0.261 -0.5418 -0.341
Let’s check that \(\mathbf{X} = \mathbf{U D V^\mathsf{T}}\)
# X equals U D V'
U %*% diag(d) %*% t(V)
[,1] [,2] [,3] [,4]
[1,] -0.512 1.858 -0.764 -0.922
[2,] 2.485 -0.066 0.082 0.862
[3,] 1.008 -0.163 0.743 2.003
[4,] 0.293 -0.200 -0.084 0.937
[5,] -0.209 0.301 -0.793 -1.616
# compare to X
X
[,1] [,2] [,3] [,4]
[1,] -0.512 1.858 -0.764 -0.922
[2,] 2.485 -0.066 0.082 0.862
[3,] 1.008 -0.163 0.743 2.003
[4,] 0.293 -0.200 -0.084 0.937
[5,] -0.209 0.301 -0.793 -1.616
Let’s also confirm that \(\mathbf{U}\) and \(\mathbf{V}\) are orthonormal:
# U orthonormal (U'U = I)
t(U) %*% U
[,1] [,2] [,3] [,4]
[1,] 1.00e+00 1.39e-16 2.78e-17 0.00e+00
[2,] 1.39e-16 1.00e+00 -2.78e-17 -8.33e-17
[3,] 2.78e-17 -2.78e-17 1.00e+00 5.55e-17
[4,] 0.00e+00 -8.33e-17 5.55e-17 1.00e+00
# V orthonormal (V'V = I)
t(V) %*% V
[,1] [,2] [,3] [,4]
[1,] 1.00e+00 -1.11e-16 -5.55e-17 1.11e-16
[2,] -1.11e-16 1.00e+00 8.33e-17 1.94e-16
[3,] -5.55e-17 8.33e-17 1.00e+00 -8.33e-17
[4,] 1.11e-16 1.94e-16 -8.33e-17 1.00e+00
13.1.4 Relation of SVD and Cross-Product Matrices
If you consider \(\mathbf{X}\) to be a data matrix of \(n\) individuals and \(p\) variables, you should be able to recall that we can obtain two cross-products: \(\mathbf{X^\mathsf{T}X}\) and \(\mathbf{X X^\mathsf{T}}\). It turns out that we can use the singular value decomposition of \(\mathbf{X}\) to find the corresponding factorization of each of these products.
The cross-product matrix of columns can be expressed as:
\[\begin{align*} \mathbf{X^\mathsf{T} X} &= (\mathbf{U D V^\mathsf{T}})^\mathsf{T} (\mathbf{U D V^\mathsf{T}}) \\ &= (\mathbf{V D U^\mathsf{T}}) (\mathbf{U D V^\mathsf{T}}) \\ &= \mathbf{V D} (\mathbf{U^\mathsf{T}} \mathbf{U}) \mathbf{D V^\mathsf{T}} \\ &= \mathbf{V D^2 V^\mathsf{T}} \end{align*}\]
The cross-product matrix of rows can be expressed as:
\[\begin{align*} \mathbf{X X^\mathsf{T}} &= (\mathbf{U D V^\mathsf{T}}) (\mathbf{U D V^\mathsf{T}})^\mathsf{T} \\ &= (\mathbf{U D V^\mathsf{T}}) (\mathbf{V D U^\mathsf{T}}) \\ &= \mathbf{U D} (\mathbf{V^\mathsf{T}} \mathbf{V}) \mathbf{D U^\mathsf{T}} \\ &= \mathbf{U D^2 U^\mathsf{T}} \end{align*}\]
One of the interesting things about SVD is that \(\mathbf{U}\) and \(\mathbf{V}\) are matrices whose columns are eigenvectors of product moment matrices that are derived from \(\mathbf{X}\). Specifically,
\(\mathbf{U}\) is the matrix of eigenvectors of (symmetric) \(\mathbf{XX^\mathsf{T}}\) of order \(n \times n\)
\(\mathbf{V}\) is the matrix of eigenvectors of (symmetric) \(\mathbf{X^\mathsf{T}X}\) of order \(p \times p\)
Of additional interest is the fact that \(\mathbf{D}\) is a diagonal matrix whose main diagonal entries are the square roots of \(\mathbf{U}^2\), the common matrix of eigenvalues of \(\mathbf{XX^\mathsf{T}}\) and \(\mathbf{X^\mathsf{T}X}\).
The EVD of the cross-product matrix of columns (or minor product moment) \(\mathbf{X^\mathsf{T} X}\) can be expressed as:
\[ \mathbf{X^\mathsf{T} X} = \mathbf{V \Lambda V^\mathsf{T}} \]
in terms of the SVD factorization of \(\mathbf{X}\):
\[ \mathbf{X^\mathsf{T} X} = \mathbf{V D^2 V^\mathsf{T}} \]
The EVD of the cross-product matrix of rows (or major product moment) \(\mathbf{X X^\mathsf{T}}\) can be expressed as:
\[ \mathbf{X X^\mathsf{T}} = \mathbf{U \Lambda U^\mathsf{T}} \]
in terms of the SVD factorization of \(\mathbf{X}\):
\[ \mathbf{X X^\mathsf{T}} = \mathbf{U D^2 U^\mathsf{T}} \]