Singular Value Decomposition
A matrix is just a table of numbers from which it is difficult to extract the information that is relevant for solving the problem at hand. A general and powerful strategy for revealing the salient features of a matrix is to decompose (or "factor") it into a product of other simpler matrices whose characteristics can be clearly identified and interpreted.
We already saw three examples of this approach :
* The spectral decompostion of a symmetric matrix.
* The Cholesky factorisation of a positive semidefinite matrix.
* The normal form of any matrix.
The most general and possibly most useful factorization of a matrix is the Singular Value Decomposition (SVD), which can be stated as follows :
* Let A be any mxn matrix of rank r. Then there exist :
- A mxm orthogonal matrix U,
- A nxn orthogonal matrix V,
- And a mxn (same size as A) pseudo-diagonal matrix (all off diagonal elements are 0, but the matrix is not square) ,
such that :
A = U V' or equivalently U'AV =
This decompostion may be represented as :
This figure illustrates the case r < n < m, but the decomposition is also valid for m n, and for r = min(m, n) (that is when A is full rank).
The "diagonal" of is contains the singular values of A. They are real, non negative numbers. We assume that the rows of U and the columns of V' have been reordered so that the singular values appear by decreasing order of values as one crawls down the diagonal.
* The upper part (red strip) contains the positive singular values. We'll show that :
- There are r positive singular values, where r is the rank of A. So the rank of a matrix is revealed by the number of its positive singular values.
- Each positive singular value is equal to the positive square root of one of the eigenvalues of A'A (or AA' ).
* The lower part of the diagonal (gray strip) contains the (n - r) 0, or "vanishing", singular values.
contains mostly 0s, and this suggests that there should exist another version of SVD with most of the 0s discarded. It is indeed the case, and the "reduced" Singular Value Decompostion is as follows :
A = U1V'1
It is obtained simply by :
* Amputating U of its (m - r) rightmost columns.
* Amputating V of its (n - r) rightmost columns (and therefore V' of its (n - r) lowest rows).
* Keeping the square upper-left part of containing the strictly positive singular values, and discarding the rest. The result is the matrix in the above illustration.
This reduced form is equivalent to the full SVD, which is obtained from the reduced form :
* By adding (m - r) columns to U1. These columns have to be orthonormal vectors that are also orthogonal to the columns of U1 so as to make the complete U an orthogonal matrix. They can be built, for instance, by resorting to the Gram-Schmidt orthonormalization procedure. We'll show that, except for this orthonormality constraint, the actual choice of these complementary vectors does not affect the validity of the full decomposition, which is therefore not unique.
* Similarly, by adding (n - r) orthonormal vectors to the r (orthonormal) rows of V'1.
* By converting into by appropriately padding it with 0s.
It can be shown that the reduced Singular Value Decomposition is unique (up to the signs of the singular vectors) if and only if all the positive singular values are distinct.
When a matrix A is not square, the concepts of eigenvector and eigenvalue are meaningless, but they can be generalized by the concepts of singular vector and singular value.
A positive number s is said to be a singular value of A is there exist :
* A vector u in Rn , and
* A vector v in Rm,
such that :
Av = su and A'u = sv
* u is said to be a left singular vector.
* v is said to be a right singular vector.
We'll show that :
* A left singular vector u is an eigenvector of AA' associated to the eigenvalue s².
* A right singular vector v is an eigenvector of A'A associated to the eigenvalue s².
Singular Value Decomposition receives several geometric interpretations that make its somewhat abstract architecture somewhat more comprehensible. We now give two of these interpretations.
Any mxn matrix can be regarded as the matrix implementation of a linear operator from Rn into Rm.
Consider the SVD of A :
* As U is orthogonal, its columns form an orthonormal basis of Rm.
* As V is orthogonal, its columns (the rows of V' ) form an orthonormal basis of Rn.
The expression Av = su (see above) receives a geometric interpretation : the image in Rm of a right singular vector (in Rn) is equal to s times the left singular vector associated to the singular value s.
Let then x be a vector of Rn. For simplicity, we assume A to be full rank, say n ( m).
Then to obtain Ax, take the following steps :
* First project x on the orthonormal basis of Rn embodied by the columns of V (right singular vectors). The values of these projections are the coordinates of x in this basis.
* Then multiply ("stretch") each of these n coordinates by the corresponding singular value of A.
* Use these numbers as coordinates on the first n left singular vectors in Rm (i.e. the first n columns of U ). You thus define a vector y.
* Set the remaining (m - n) last coordinates of y (on the (m - n) last columns of U) to 0.
Then y is equal to Ax.
The following figure illustrates this process on the first coordinate v1 of a vector x (with n = 2 and m = 3).
For any vector x, the third coordinate in R3 (on u3 ) will always be 0. But it is so only because of the very particular choice of a basis in R3 (i.e. the columns of U).
We leave it to the reader to decide how this interpretation should be amended :
* If m < n.
* If A is not full rank.
This geometric interpretation helps to understand the exact roles of the U1 and V1 of the reduced form. We do not assume A to be full rank anymore.
* Let's put bases and matrices aside for a moment, and rather focus on the concept of "linear operator from Rn to Rm ".
- The set of vectors in Rn which are transformed into the null vector of Rm is a subspace of Rn called the kernel of the operator.
- The set of the transformed vectors of Rn is a certain subspace of Rm called the range of the operator. The dimension of the range is at most n, but it may be strictly smaller than n. In other words, the dimension of the range of an operator may be less than the dimension of the original space : linear operators may "crush" dimensions. The dimension of the range is called the rank of the operator.
It can be shown that :
dim(Range) + dim(Kernel) = n
- Any vector in Rn can be decomposed in a unique way into the sum of a vector in the kernel, and of a vector in the orthogonal complement of the kernel (the subspace of the vectors that are orthogonal to all the vectors in the kernel), that we denote (Kernel)z. A vector in Rn clearly has the same transform as its projection on (Kernel)z, which is therefore the only "active" subspace of Rn : the operator is completely defined by its action on (Kernel)z, which is sent in a one-to-one way onto (Range).
Any basis of (Kernel)z and any basis of (Range) can then be used to define the operator in matrix form.
* But what the reduced form of SVD says is that there exist :
- A certain orthonormal basis of (Kernel)z (of dimension r), embodied by the columns of V1,
- And a certain orthonormal basis of (Range) (also of dimension r), embodied by the columns of U1,
such that the action of the operator on any vector x takes the particularly simple form :
- Project x on the V1 basis of (Kernel)z.
- Multiply the r coordinates thus obtained by the corresponding singular values (which are non vanishing).
- Use these new values as the coordinates of Ax in the U1 basis of (Range).
- If you want the full SVD, just add vectors to the foregoing bases in any way that will preserve their orthonormality, and add 0s to the coordinates of the transform of x.
So the reduced SVD gives a complete description of the mechanism of a linear operator both analytically and functionally. It is truly the core of SVD, the full form being just the padded version of the reduced SVD.
In Rn, consider the set of all vectors of length 1 : the tips of these vectors form the n-dimensional unit sphere (see illustration below).
It can be shown that the tips of the A-transformed vectors :
* Are on a r-dimensional ellipsoid.
* The directions of the principal axes of this ellipsoid are the columns of U1 (whose antecedents are the columns of V1 ).
* The half-lengths of these principal axes are the singular values of A.
By expanding the reduced SVD of a matrix A, one obtains :
A = i siuiv'i i = 1, 2, ..., r
with the singular values sorted by decreasing values.
Note the similarity with the expansion of the spectral decomposition of a symmetric matrix.
Every outer product uiv'i is a rank 1 matrix.
Suppose that the above sum is truncated so that only the first k (k < r) terms are retained. Then it can be shown that :
A* = i siuiv'i i = 1, 2, ..., k
is the best approximation of A in the least-squares sense by a rank k matrix. No other mxn matrix of rank k can make the quadratic "approximation error" :
ij | a*ij - aij |²
This result is known as the Eckart-Young theorem. It is the basis of a method for compressing the information in a data matrix A (which is then considered as a data table and not as a linear operator).
The spectral decomposition of a positive semidefinite matrix A is :
A = UDU'
* U is an orthogonal matrix.
* D is the diagonal matrix whose diagonal elements are the (non negative) eigenvalues of A.
The spectral decomposition of A is then identical to its Singular Value Decomposition. For the diagonal elements of D are indeed the positive square roots of the eigenvalues of A'A, which are themselves the squares of the eigenvalues of A.
The SVD of a positive semidefinite matrix is identical to its spectral decomposition.
This is not true of any symmetric matrix because then some eigenvalues might be negative.
SVD is a truly universal tool.
In this site, SVD will be put to work in Principal Components Analysis (PCA) and in Ridge Regression. But it is almost indispensible whenever it is needed to :
* Calculate the rank of a matrix, which is then the number of non 0 singular values.
* Identify the null space of a matrix : it is clearly the subspace of Rn spanned by the (n - r) right singular vectors corresponding to the vanishing singular values. These vectors form an orthonormal base of the null space of A.
* Identify the range of a matrix (the subspace spanned by its columns) : it is clearly the subspace of Rm spanned by the r left singular vectors corresponding to non 0 singular values. These vectors form an orthonormal base of the range of A.
Formally similar to, but less ambitious than Singular Value Decomposition is the factorization of a matrix into a so-called normal form. Simpler than SVD, it also captures less information about the matrix as only its rank is revealed by this factorizarion.
The factorization of a matrix into a normal form is addressed here.
The truly difficult part about SVD was to invent it. Once it is known to exist, it is not too difficult to demonstrate that indeed any matrix can be decomposed in the SVD form. This is what we do in this Tutorial (except for the uniqueness of the reduced form iff all singular values are distinct). So, although certainly not trivial, this demonstration is a low price to pay for such a powerful and universal result.
SINGULAR VALUE DECOMPOSITION
The full Singular Value Decomposition
Singular values and singular vectors
The range of A
The reduced Singular Value Decomposition
Related readings :
Want to contribute to this site ?