Chapter 9
COURANT-FISCHER MIN-MAX THEOREMS
A Variational Characterization of Eigen-Values
Let A be Hermitian n´ n and (l , x) be an eigen-pair of A. Then, using the standard inner product l (x,x) = (Ax,x) = (x,A*x) = (x,Ax) = l *(x,x), (l * denoting complex conjugate of l ), and so, as x ¹ 0, Þ l = l *, i.e., l is real. Hence the eigen-values values of a Hermitian matrix are real. Let the n eigen-values of A (counted after algebraic multiplicities) be indexed according to l 1 ³ l 2 ³ l 3 ³ … ³ l n and let u1, u2, u3, ... , un be the corresponding orthonormal eigen-vectors (obtained e.g., via unitary diagonalization). A variational characterization of eigen-values of a Hermitian matrix is as follows:
Theorem 1. Let U0 = {0} and Uk = span {u1,u2, ... ,uk} º <u1,u2, ... ,uk>. Then
(a) l k = max0¹ x^ Uk-1 (Ax,x)/(x,x), k = 1, 2, ... , n; &
(b) l k = min0¹ xÎ Uk (Ax,x)/(x,x), k = 1, 2, ... , n.
Proof: 0 ¹ x ^ Uk-1 Û x = S k£ j£ n a juj, with at least one a j non-zero. Hence
max0
¹ x^ Uk-1 (Ax,x)/(x,x) = maxa j S k£ j£ n l j|a j|2/S k£ j£ n |a j|2 = l k,(which is obtained e.g., for the vector x = uk). Similarly, 0 ¹ x Î Uk Û x = S k£ j£ n a juj, with at least one a j ¹ 0. Hence, min0¹ xÎ Uk (Ax,x)/(x,x) = mina j S k£ j£ n l j|a j|2/S k£ j£ n |a j|2 = l k, (the min. being attained e.g., by x = uk). #
Remark. If A is real symmetric, uj’s could be taken real (by orthogonal diagonalization) and since with x = S k£ j£ n a juj, (Ax,x)/(x,x) = S k£ j£ n l j|a j|2/S k£ j£ n |a j|2, it follows that all the values taken by the Rayleigh-quotient (Ax,x)/(x,x) for complex x are also attained for real vectors (note the presence of |a|2). Hence, in Theorem 1, and so in the resuts below, when A is real-symmetric the results hold with the confinement to real subspaces, i.e., working only w.r.t. R n.
Indeed, Theorem 1 heavily depends on certain particular subspaces and so is not suitable for many applications (see, e.g., the proof of Theorem 3 below). This difficulty is avoided through the following Courant-Fischer min-max theorem:
Theorem 2. (R. Courant & E. Fischer Min-Max Theorem). Let W stand for an arbitrary k-dimensional subspace of C . Let A be Hermitian. Then for k = 1, 2, 3, ... , n
(I) l k = minWk-1 max0¹ x^ Wk-1 (Ax,x)/(x,x),
(II) l k = minWn-k+1 max0¹ xÎ Wn-k+1 (Ax,x)/(x,x),
(III) l k = maxWn-k min0¹ x^ Wn-k (Ax,x)/(x,x),
(IV) l k = maxWk min0¹ xÎ Wk (Ax,x)/(x,x).
Proof: Let m k = minWk-1 max0¹ x^ Wk-1 (Ax,x)/(x,x). Restricting "min" to Wk-1 = Uk-1 only, by Theorem 1(a), m k < l k. For the reverse inequality, with x = S 1£ j£ n a juj, restricting the set on which "max" is taken
m
k ³ minWk-1 max0¹ x^ Wk-1,xÎ Uk (Ax,x)/(x,x) = minWk-1 max0¹ x^ Wk-1,xÎ Uk S 1£ j£ k l j|a j|2/S 1£ j£ n |a j|2 ³ minWk-1 max0¹ x^ Wk-1,xÎ Uk l k = l k,provided we can show that {x : 0 ¹ x ^ Wk-1 , x e Uk} is non-empty. For this, suppose Wk-1^ and Uk have no non-zero element in common. Then Z = Wk-1^ + Uk = Wk-1^ Å Uk is a direct sum. Now, Z being a subspace of C n, n ³ dim Z = dim Wk-1^ + dim Uk = n-(k-1) + k = n+1, a contradiction. This establishes (I).
Noting that (I) Û (II) and (III) Û (IV), it remains to prove (III) only. For this, put
n
k = maxWn-k min0¹ x^ Wn-k (Ax,x)/(x,x).Again, restricting Wn-k’s to Uk^ only, by Theorem 1(b), one has: n k ³ l k. Next, with x = S 1£ j£ n a juj,
n
k £ maxWn-k min0¹ x^ Wn-k,Uk-1 (Ax,x)/(x,x) = maxWn-k min0¹ x^ Wn-k,Uk-1 S k£ j£ n l j|a j|2/S 1£ j£ n |a j|2 £ maxWn-k min0¹ x^ Wn-k,Uk-1 l k = l k,provided again we can show that Wn-k^ and U k-1^ have a non-zero intersection. But if the intersection is {0}, Y = Wn-k^ + U k-1^ is a direct sum of dimension equaling k + n-(k-1) = n + 1, again a contradiction. This completes the proof of Courant-Fischer min-max theorem. #
Corollary. Let W k stand for an arbitrary subspace of dimension ³ k and w k for that of dimension £ k. Let A be hermitian m´ n. Then for 1 £ k £ n,
(I) l k = min {max {x*Ax/x*x : 0 ¹ x ^ w k-1} : w k-1};
(II) l k = min {max {x*Ax/x*x : 0 ¹ x Î W n-k+1} : W n-k+1};
(III) l k = max {min {x*Ax/x*x : 0 ¹ x ^ w n-k} : w n-k};
(IV) l k = max {min {x*Ax/x*x : 0 ¹ x Î W k} : W k}.
Proof: By decreasing a subspace we allow more x's to become orthogonal to it. Thus in (I) the max does not decrease while in (III) the min does not increase. Similarly, by increasing a subspace we allow more elements to become available so that in (II) the max does not decrease, while in (IV) the min does not increase. Hence the additional subspaces do not change the min in (I) and (II) and the max in (III) and (IV). #
Corollary. If A is n´ n hermitian matrix and B Î C n´ k, 1 £ k £ n-1, then: infB supB*x=0 x*Ax/x*x = l k+1, and, supB infB*x=0 x*Ax/x*x = l n-k.
Proof: The column space of B is atmost k-dimensional. Hence the results follow, respectively from (I) and (III) of the previous corollary. #
Theorem 2 may be used to prove the following Sturmian separation theorem:
Corollary. (Sturmian Separation Theorem). Let A be n´ n Hermitian and Ak its k-th principal sub-matrix (composed of the intersection of the first k rows and columns). Then the eigenvalues of Ai+1 interlace those of Ai, i.e.,
l
k+1(Ai+1) £ l k(Ai) £ l k(Ai+1), 1 £ k £ i; 1 £ i £ n-1.Proof: Since Ai’s themselves are Hermitian, it is sufficient to prove the inequalities for i = n-1. Let (Wk)n simultaneously stand for an arbitrary k-dimensional subspace of C n of elements with the n-th component xn = 0 and an arbitrary k-dimensional subspace of C n-1 . By Theorem 2(II),
l
k+1 = minWn-k max0¹ xÎ Wn-k (Ax,x)/(x,x) £ min(Wn-k)n max0¹ xÎ (Wn-k)n (Ax,x)/(x,x) £ min(Wn-k)n max0¹ xÎ (Wn-k)n (An-1x,x)/(x,x) £ min(W(n-1)-k+1)n max0¹ xÎ (W(n-1)-k+1)n (An-1x,x)/(x,x) £ l k(An-1).Also, by Theorem 2(IV), l k(An) = maxWk min0¹ xÎ Wk (Ax, x)/(x, x) ³ max(Wk)n min0¹ xÎ (Wk)n (Ax,x)/(x,x) ³ max(Wk)n min0¹ xÎ (Wk)n (An-1x, x)/(x, x) = l k(An-1). #
Corollary. (Poincare Separation Theorem). Let A be n´ n and B n´ k subunitary (B*B = I). Then:
l
i(B*AB) £ l i(A), 1 £ i £ k;l
k-j(B*AB) ³ l n-j(A), 0 £ j £ k-1.Proof: Augment B with n-k additional columns to get a n´ n unitary matrix V. Then l i(V*AV) = l i(A), as A and V*AV are similar. If we put V*AV = C, then Ck = B*AB. Hence
l
i(Ck) = l i(B*AB) £ l i(Ck+1) £ l i(Ck+2) £ … £ l i(Cn) = l i(A).Similarly, l k-j(B*AB) = l k-j(Ck) ³ l k+1-j(Ck+1) ³ l k+2-j(Ck+2) ³ … ³ l n-j(Cn) = l n-j(A). #
Aliter: Use l i = max {min {(Ax,x)/(x,x) : x Î Wi}: Wi}. Restricting Wi’s to C k Þ l i(A) ³ l i(B*AB), 1 £ i £ k. Similarly, using l n-j = min {max {(Ax,x)/(x,x) : x Î Wj+1}: Wj+1} and restricting Wj+1’s to C k, 0 £ j £ m-1, Þ l n-j(A) £ l k-j(B*AB). #
Theorem. Let S stand for a set of k mutually orthogonal non-zero n´ 1 vectors v1, v2, ... , vk and let A be a hermitian n´ n matrix with eigenvalues l i’s in decreasing fashion. Then:
supS {S 1£ i£ k ((vi)*Avi)/((vi)*vi)} = S 1£ i£ k l i;
infS {S 1£ i£ k ((vi)*Avi)/((vi)*vi)} = S n-k+1£ i£ k l i.
The "sup" and the "inf" are attained respectively for the vectors u1, u2, ... , uk and un-k+1, un-k+2, ... , un, where ui’s are the orthonormal eigenvectors of A corresponding to the eigenvalues l i.
Proof: Without loss of generality, vi’s may be assumed to be orthonormal. Let vi = S 1£ j£ n cijuj. Then
S
1£ i£ k ((vi)*Avi)/((vi)*vi) = S 1£ i£ k (S 1£ p£ k cipup)*(S 1£ j£ k l jcijuj) = S 1£ i£ k (S 1£ p£ k l p|cip|2) = S 1£ p£ k l p(S 1£ i£ k |cip|2) = S 1£ p£ k l p(S 1£ i£ k |(vi, up)|2) = S1£ p£ k l pa p,say. Note that 0 £ a p £ 1, they being the sum of squares of projection coefficients of up w.r.t. vi’s orthonormal and also that S 1£ p£ n a p = S 1£ i£ k ||vi||2 = k. The required estimates follow from this. #
In the following result, ||A||F denotes the Frobenius norm given by ||A||F = S i (S j |aij|2)1/2 = (tr A*A)1/2.
Theorem. Let A be p.d. and C stand for any matrix of rank k. Then infC ||A-C||F = (l k+12 + ... + l n2)1/2, and the infimum is attained when C = l 1u1(u1)* + ... + l kuk(uk)*.
Proof: Since the Frobenius norm is invariant under unitary transformations, let U = (u1 | u2 | ... | un) be a unitary diagonalizer of A, ui’s being normalized eigenvectors corresponding to the eigenvalues l i’s in decreasing order. Then, U*AU = diag (l i, l 2, ... , l n). A general matrix C can be written as C = UBU* and it is of rank k iff B is of rank k. Then ||A-C||F2 = S 1£ i£ n |l i – bii|2 + (S i¹ j |bij|2). Since r(B) = k, not more than k of bii’s can be non-zero. Hence the first sum on the r.h.s. above is ³ l k+12 + ... + l n2, showing that InfB ||A-B||F ³ (l k+12 + ... + l n2)1/2. Moreover the lower bound on the r.h.s. is attained if bij = 0, i ¹ j, and bii = l i , 1 £ i £ k, the B thus obtained is diag (0, 0, ... , l k+1, ... , l n) and is of rank k. Thus a C for which the bound is attained is given by C = UBU* = l 1u1(u1)* + ... + l kuk(uk)*, as stated. #
Let’s recall that the inertia of a matrix is defined by In(A) = (p (A), n (A), d (A)), p , n , and d denoting the number of eigenvalues of A with positive, negative and zero real parts, respectively, counted after their algebraic multiplicities. The Sylvester’s inertia theorem says that a non-singular conjugate linear transformation does not change the inertia of a hermitian matrix and can be proved easily by using the Courant-Fischer min-max theorem:
Sylvester's Inertia Theorem. If C is non-singular and H is hermitian, In(C*HC) = In(H) = (p (H), n (H), d (H)), p , n , and d denoting the number of eigenvalues with positive, negative and zero real parts, respectively, counted after their algebraic multiplicities).
Proof: With x ¹ 0 and y = Cx, we have (C*HCx, x)/(x, x) = ((Hy, y)/(y, y)) ((Cx, Cx)/(x, x)), so that l n(C*C)((Hy, y)/(y, y)) £ (C*HCx, x)/(x, x) £ l 1(C*C)((Hy, y)/(y, y)). Taking max over k-dimensional subspace Wk’s of min over x Î Wk, we get the inequality: l n(C*C)l k(H) £ l k(C*HC) £ l 1(C*C)l k(H), called Ostrowsky's quantitative formulation of the Sylvester’s inertia theorem. Since C is non-singular, C*C > 0 (i.e., p.d.) and so has positive eigenvalues. Hence l k(H) and l k(C*HC) have the same sign for all k from which the Sylvestor’s inertia theorem follows. #
Exercise. Prove that Sylvester's inertia theorem is equivalent to each of the following statements:
(a) If P > 0 and H is hermitian then In (PH) = In (H).
(b) If AH > 0 and H is hermitian then IN A = in H.
Generalization of Haynsworth Inertia Sum Formula: If H = (Hij)1£ i,j£ 2 is a partitioned hermitian matrix with H11 non-singular, In H = In H11 + In (H22 –H12*(H11)-1H12), which is known as Haynsworth inertia sum formula. If H is possibly singular but the column space M (H11) É M (H12), we have the generalization: In H = In H11 + In (H22 –H12*(H11)-H12), for any choice of g-inverse (H11)-.
Proof: If M (H11) É M (H12), for any choice of H11-, H12 = H11(H11)-H12. By Sylvester's inertia theorem
and the result follows. #
Remark. If H11 is non-singular then M (H11) É M (H12), so that the Haynsworth inertia sum formula is a special case of the above result. Moreover, when M (H11) É M (H12), H12 = H11C for some C and then In H = In H11 + In (H22 –C*H11C).
Given's Method for a Hermitian Matrix
In this method for A real-symmetric (hermitian) matrix, we use a succession of similarity transformations based on the Jacobi planar rotations to first turn the matrix into a tri-diagonal real-symmetric (hermitian) matrix similar to the given one. The procedure ia a variant of the iterative Jacobi method and is based on the rows and columns
(2, 3), (2, 4), ... , (2, n),
(3, 4), ... , (3, n),
………...,
(n-1, n),
to turn the elements in the following positions to zero
(1, 3), (1, 4), ... , (1, n),
(2, 4), ... , (2, n),
……......,
(n-2, n).
In this sequence the earlier created zeros continue to remain zeros and at the end of the process after atmost (n2 -3n+2))/2 similarity transformations the real-symmetric (hermitian) matrix A is turned into a tri-diagonal real-symmetric (hermitian) matrix. Subsequently, the method of Sturm sequences is used to locate the eigenvalues of the matrix.
Note that the iterative Jacobi method is based on (i, j)-th rows and columns to turn (i, j) and (j, i)-th entries to zero, whereas here in the Given’s method the same effect is produced by using (i+1, j)-th rows and columns. The advantage is that the previously created zeros do not revive and the process terminates in n-2 steps; the disadvantage is that one ends up with a tri-diagonal, rather than a diagonal matrix.
One could also use Householder reflections for the tri-diagonalization: in which for 1 £ i £ n-2, for the i-th column one reflects the (n-i)-component column vector of bottom (n-i)-entries in the direction of the first unit vector in K n-I, along with the corresponding conjugate row operations on the last (n-i)-entries of the i-th row. At the i-th stage the earlier processed i-1 rows and columns remain unchanged.
Eigenvalues of a Tridiagonal Hermitian Matrix
Consider a tridiagonal hermitian matrix
.
We consider the non-degenerate case where each cj is non-zero. If any cj is zero the eigenvalues of A are the eigenvaues of the j-th principal submatrix Aj and the (n-j)´ (n-j) submatrix of A obtained by deleting the first j-rows and the first j-columns of A, to which the method can be applied separately. Putting f0(x) = 1 and fk(x) = |xI-Ak|, 1 £ k £ n, we have fk+1(x) = (x-bk+1)fk(x) - |c|2fk-1(x), 1 £ k £ n-1. From this it follows that at a zero of fk(x), i.e., at an eigenvalue x of Ak, fk(x) and fk+1(x) have opposite signs. This implies that the eigenvalues of Ak+1 strictly interlace the eigenvalues of Ak. If fk(x) and fk+1(x) have opposite signs, it signifies that Ak+1 has one more eigenvalue on the right of x than Ak. Hence the number N(x) of sign changes of the sequence {f0(x), f1(x), f2(x), … , f3(x), ... ,fn(x)}, equals the number of eigenvalues of A on the right to x, which is the number of eigenvalues of A lying in the open interval (x, ¥ ). It follows that: N(a)-N(b) equals the number of eigenvalues of A lying in the interval (a,b]. If f (b) ¹ 0, it equals the number of eigenvalues in (a,b). If fn(a) ¹ 0, it equals the number of eigenvalues in [a,b]. Thus using different a’s and b’s all the eigenvalues of A could be located.