Chapter 8
Advanced Topics on Diagonalizability and Triangularization*

8.1 More on the Spectrum of a Matrix

We start this subsection with a few definitions and examples. So, it will be nice to recall the notations used in Section 1.3.1 and a few results from Appendix 9.2.

Definition 8.1.1. [Principal Minor] Let A Mn().

1.
Also, let S [n]. Then, det(A [S, S]) is called the Principal minor of A corresponding to S.
2.
By EMk(A), we denote the sum of all k × k principal minors of A.

Definition 8.1.2. [Elementary Symmetric Functions] Let k be a positive integer. Then, the kth elementary symmetric function of the numbers r1,,rn is Sk(r1,,rn) and is defined as

                 ∑
Sk(r1,...,rn) =        ri1 ⋅⋅⋅ri .
               i1<⋅⋅⋅<ik       k

PICT PICT DRAFT Example 8.1.3. Let A = ⌊          ⌋
  1 2  3  4
|| 5 6  7  8||
||          ||
⌈ 9 8  7  6⌉
  5 4  3  2. Then, note that

1.
EM1(A) = 1 + 6 + 7 + 2 = 16 and EM2(A) = detA({1,2},{1,2}) + detA({1,3},{1,3}) + detA({1,4},{1,4}) + detA({2,3},{2,3}) + detA({2,4},{2,4}) + detA({3,4},{3,4}) = -80.
2.
S1(1,2,3,4) = 10 and S2(1,2,3,4) = 1 (2 + 3 + 4) + 2 (3 + 4) + 3 4 = 9 + 14 + 12 = 35.

Theorem 8.1.4. Let A Mn() and let σ(A) = {λ1,n}. Then,

1.
the coefficient of tn-k in PA(t) = i=1n(t - λi), the characteristic polynomial of A, is
       ∑
(- 1)k       λi ⋅⋅⋅λi  = (- 1)kSk(λ1,...,λn).
      i1<⋅⋅⋅<i   1     k
           k
(8.1.1)

2.
EMk(A) = Sk(λ1,n).

Proof. Note that by definition,

PICT PICT DRAFT
           ∏n
PA (t) =     (t-  λi) = tn - S1 (λ1,...,λn)tn-1
           i=1
                                           n-2            n
                           +S2 (λ1,...,λn)t   - ⋅⋅⋅+ (- 1) Sn(λ1,...,λn) (8.1.2)
       =   tn - EM1 (A)tn-1 + EM2 (A)tn-2 - ⋅⋅⋅ + (- 1)nEMn  (A).          (8.1.3)
As the second part is just a re-writing of the first, we will just prove the first part. To do so, let B = tI - A = ⌊                   ⌋
 t - a11 ⋅⋅⋅   - a1n
||        ..         ||
⌈          .        ⌉
  - an1  ⋅⋅⋅  t- ann. Then, using Definition 9.2.2 in Appendix, note that detB = σsgnσ i=1nb(i) and hence each S [n] with |S| = n - k has a contribution to the coefficient of tn-k in the following way:

For all i S, consider all permutations σ such that σ(i) = i. Our idea is to select a ‘t’ from these b(i). Since we do not want any more ‘t’, we set t = 0 for any other diagonal position. So the contribution from S to the coefficient of tn-k is det[-A(S|S)] = (-1)k detA(S|S). Hence the coefficient of tn-k in PA(t) is

    k    ∑                        k   ∑                      k
(- 1)             detA(S |S ) = (- 1)         detA [T,T] = (- 1) Ek (A ).
     S⊆ [n],|S|=n-k                   T⊆[n],|T|=k
The proof is complete in view of Equation (8.1.2). _

As a direct application, we obtain Theorem 6.1.16 which we state again.

Corollary 8.1.5. Let A Mn() and let σ(A) = {λ1,n}. Then Tr(A) = 1nλi and detA = 1nλi.

PICT PICT DRAFT

Let A and B be similar matrices. Then, by Theorem 6.1.20, we know that σ(A) = σ(B). Thus, as a direct consequence of Part 2 of Theorem 8.1.4 gives the following result.

Corollary 8.1.6. Let A and B be two similar matrices of order n. Then, EMk(A) = EMk(B) for 1 k n.

So, the sum of principal minors of similar matrices are equal. Or in other words, the sum of principal minors are invariant under similarity.

Corollary 8.1.7. [Derivative of Characteristic Polynomial] Let A Mn(). Then

                  ∑n
-dPA (t) = P ′(t) =  P     (t).
dt          A     i=1  A(i|i)

Proof. For 1 i n, let us denote A(i|i) by Ai. Then, using Equation (8.1.3), we have

 n
∑  P   (t)  =  ∑   tn-1 - ∑  EM  (A )tn-2 + ⋅⋅⋅+ (- 1)n-1∑  EM    (A )
     Ai                        1   i                          n- 1  i
i=1             in- 1     i           n-2                i n-3            n-1
           =  nt    - (n - 1)EM1 (A )t    + (n- 2)EM2  (A)t   -  ⋅⋅⋅+ (- 1)   EMn  -1(A)
           =  P ′(t).
                A
Which gives us the desired result. _

Corollary 8.1.8. Let A Mn(). If Alg.Mulα(A) = 1 then Rank[A - λI] = n - 1.

Proof. As Alg.Mulα(A) = 1, PA(t) = (t - λ)q(t), where q(t) is a polynomial with q(λ)0. Thus PA(t) = q(t) + (t - λ)q(t). Hence, PA(λ) = q(λ)0. Thus, by Corollary 8.1.7, iPA(i|i)(λ) = PA(λ)0. Hence, there exists i,1 i n such that PA(i|i)(λ)0. That is, det[A(i|i) - λI]0 or Rank[A - λI] = n - 1. _

Remark 8.1.9. Converse of Corollary 8.1.8 is false. Note that for the matrix A = [     ]
  0  1
  0  0, Rank[A - 0I] = 1 = 2 - 1 = n - 1, but 0 has multiplicity 2 as a root of PA(t) = 0.

As an application of Corollary 8.1.7, we have the following result.

We now relate the multiplicity of an eigenvalue with the spectrum of a principal sub-matrix.

Theorem 8.1.10. [Multiplicity and Spectrum of a Principal Sub-Matrix] Let A Mn() and k be a positive integer. Then 1 2 3, where

1.
Geo.Mulλ(A) k.
2.
If B is a principal sub-matrix of A of size m > n - k then λ σ(B).
3.
Alg.Mulλ(A) k.
PICT PICT DRAFT

Proof. Part 1Part 2. Let {x1,,xk} be linearly independent eigenvectors for λ and let B be a principal sub-matrix of A of size m > n-k. Without loss, we may write A = [     ]
  B  *
  *  *. Let us partition the xi’s , say xi = [   ]
 x
  i1
 xi2, such that

[   | ][   ]     [   ]
  B |*   xi1 = λ  xi1 , for 1 ≤ i ≤ k.
  * |*   xi2      xi2
As m > n - k, the size of xi2 is less than k. Thus, the set {x12,,xk2} is linearly dependent (see Corollary 3.3.6). So, there is a nonzero linear combination y = [  ]
 y1
 y2 of x1,,xk such that y2 = 0. Notice that y10 and By1 = λy1.

Part 2Part 3. By Corollary 8.1.7, we know that PA(t) = i=1nPA(i|i)(t). As A(i|i) is of size n - 1, we get PA(i|i)(λ) = 0, for all i = 1,2,,n. Thus, PA(λ) = 0. A similar argument now applied to each of the A(i|i)’s, gives PA(2)(λ) = 0, where PA(2)(t) = d
--
dtPA(t). Proceeding on above lines, we finally get PA(i)(λ) = 0, for i = 0,1,,k - 1. This implies that Alg.Mulλ(A) k. _

Definition 8.1.11. [Moments] Fix a positive integer n and let α1,n be n complex numbers. Then, for a positive integer k, the sum i=1nαik is called the k-th moment of the numbers α1,n.

Theorem 8.1.12. [Newton’s identities] Let P(t) = tn + an-1tn-1 + ⋅⋅⋅ + a0 have zeros λ1,n, counted with multiplicities. Put μk = i=1nλik. Then, for 1 k n,

PICT PICT DRAFT
kan- k + μ1an- k+1 + ⋅⋅⋅+ μk- 1an-1 + μk = 0.
(8.1.4)

That is, the first n moments of the zeros determine the coefficients of P(t).

Proof. For simplicity of expression, let an = 1. Then, using Equation (8.1.4), we see that k = 1 gives us an-1 = -μ1. To compute an-2, put k = 2 in Equation (8.1.4) to verify that an-2 = -μ2+μ21-
  2. This process can be continued to get all the coefficients of P(t). Now, let us prove the n given equations.

Define f(t) = i-1--
t- λi = P-′(t)
 P(t) and take |t| > maxi|λi|. Then, the left hand side can be re-written as

       n           n                n
      ∑   --1---  ∑   -(--1---)--  ∑  1-   λi         n-  μ1-
f(t) =    t- λi =           λi  =     [t + t2 + ⋅⋅⋅] = t + t2 + ⋅⋅⋅.
      i=1         i=1 t 1 -  t     i=1
(8.1.5)

Thus, using P(t) = f(t)P(t), we get

                                             n   μ1
nantn-1 + (n - 1)an-1tn-2 + ⋅⋅⋅ + a1 = P ′(t) = [-+-2-+ ⋅⋅⋅][antn + ⋅⋅⋅+ a0].
                                             t   t
Now, equating the coefficient of tn-k-1 on both sides, we get PICT PICT DRAFT
(n - k)an-k = nan-k + μ1an-k+1 + ⋅⋅⋅+ μkan, for 0 ≤ k ≤ n - 1
which is the required Newton’s identity. _

Remark 8.1.13. Let P(t) = antn + ⋅⋅⋅ + a1t + a0 with an = 1. Thus, we see that we need not find the zeros of P(t) to find the k-th moments of the zeros of P(t). It can directly be computed recursively using the Newton’s identities.

Exercise 8.1.14. Let A,B Mn(). Then, prove that A and B have the same eigenvalues if and only if tr(Ak) = tr(Bk), for k = 1,,n. (Use Exercise 6.1.8 1a).

8.2 Methods for Tridiagonalization and Diagonalization

Let G = {A Mn() : A*A = I}. Then, using Exercise 5.4.8, we see that

1.
for every A,B G, AB G (see Exercise 5.4.8.10).
2.
for every A,B,C G, (AB)C = A(BC).
3.
In is the identity element of G. That is, for any A G,AIn = A = InA.
4.
for every A G,A-1 G.

Thus, the set G forma a group with respect to multiplication. We now define this group.

PICT PICT DRAFT Definition 8.2.1. [Unitary Group] Let G = {A Mn() : A*A = I}. Then, G forms a multiplicative group. This group is called the unitary group.

Proposition 8.2.2. [Selection Principle of Unitary Matrices] Let {Uk : k 1} be a sequence of unitary matrices. Viewing them as elements of n2 , let us assume that “for any ϵ > 0, there exists a positive integer N such that Uk - U< ϵ, for all k N”. That is, the matrices Uk’s converge to U as elements in n2 . Then, U is also a unitary matrix.

Proof. Let A = [aij] Mn() be an unitary matrix. Then i,j=1n|aij|2 = tr(A*A) = n. Thus, the set of unitary matrices is a compact subset of n2 . Hence, any sequence of unitary matrices has a convergent subsequence (Bolzano-Weierstrass Theorem), whose limit is again unitary. Thus, the required result follows. _

For a unitary matrix U, we know that U-1 = U*. Our next result gives a necessary and sufficient condition on an invertible matrix A so that the matrix A-1 is similar to A*.

Theorem 8.2.3. [Generalizing a Unitary Matrix] Let A be an invertible matrix. Then A-1 is similar to A* if and only if there exists an invertible matrix B such that A = B-1B*.

Proof. Suppose A = B-1B*, for some invertible matrix B. Then

A* = B(B -1)* = B(B -1)*BB -1 = B (B-1B *)-1B -1 = BA -1B -1.
Conversely, let A* = SA-1S-1, for some invertible matrix S. Need to show, A = S-1S*.

We first show that there exists a nonsingular Hermitian Hθ such that A-1 = Hθ-1A*Hθ, for some θ . PICT PICT DRAFT

Note that for any θ , if we put Sθ = eS then

    -1  -1    *           *
S θA  S θ =  A  and Sθ = A S θA.
Now, define Hθ = Sθ + Sθ*. Then, Hθ is a Hermitian matrix and Hθ = A*HθA. Furthermore, there are infinitely many choices of θ such that detHθ = 0. To see this, let us choose a θ such that Hθ is singular. Hence, there exists x0 such that Hθx = 0. So,
         *       -iθ  *                    2iθ      -1 *
Sθx = - Sθx = - e  S  x.Or equivalently,- e  x = S   S x.
That is, -e2 σ(S-1S*). Thus, if we choose θ0 such that -e2i(θ0) ⁄∈ σ(S-1S*) then H (θ0) is nonsingular.

To get our result, we finally choose B = β(αI - A*)H(θ0) such that β0 and α = e∕∈σ(A*).

Note that with α and β chosen as above, B is invertible. Furthermore,

BA  = α βH θ )A - βA *H θ )A = α βH  θ)A - βH  θ ) = βH θ )(αA - I).
           (0           (0          (0        ( 0       (0
As we need, BA = B*, we get βH(θ0)(αA - I) = βH(θ0)(αI - A) and thus, we need β = -βα, which holds true if β = ei(π-γ)2. Thus, the required result follows. _

Exercise 8.2.4. Suppose that A is similar to a unitary matrix. Then, prove that A-1 is similar to A*. PICT PICT DRAFT

8.2.1 Plane Rotations

Definition 8.2.5. [Plane Rotations] For a fixed positive integer n, consider the vector space n with standard basis {e1,,en}. Also, for 1 i,j n, let Ei,j = eiejT . Then, for θ and 1 i,j n, a plane rotation, denoted U(θ;i,j), is defined as

U (θ;i,j) = I - Ei,i - Ej,j + [Ei,i + Ej,j]cosθ - Ei,j sinθ + Ej,isinθ.
That is, U(θ;i,j) = ⌊                               ⌋
|1                              |
|    ...                         |
||                               ||
||        cosθ      - sinθ       ||
||              ...              ||
||                               ||
||        sin θ       cosθ        ||
|                          ..   |
⌈                            .  ⌉
                               1i j , where the unmentioned diagonal entries are 1 and the unmentioned off-diagonal entries are 0.

Remark 8.2.6. Note the following about the matrix U(θ;i,j), where θ and 1 i,j n.

1.
U(θ;i,j) are orthogonal.
2.
Geometrically U(θ;i,j)x rotates x by the angle θ in the ij-plane. PICT PICT DRAFT
3.
Geometrically (U (θ;i,j))T x rotates x by the angle -θ in the ij-plane.
4.
If y = U(θ;i,j)x then the coordinates of y are given by
(a)
yi = xi cosθ - xj sinθ,
(b)
yj = xi sinθ + xj cosθ, and
(c)
for li,j, yl = xl.
5.
Thus, for x n, the choice of θ for which yj = 0, where y = U(θ;i,j)x equals
(a)
θ = 0, whenever xj = 0. That is, U(0;i,j) = I.
(b)
θ = cot-1(    )
 - xi
   xj, whenever xj0.
6.
[Geometry] Imagine standing at 1 = (1,1,1)T 3. We want to apply a plane rotation U, so that v = UT 1 with v2 = 0. That is, the final point is on the xz-plane.

Then, we can either apply a plane rotation along the xy-plane or the yz-plane. For the xy-plane, we need the plane z = 1 (xy plane lifted by 1). This plane contains the vector 1. Imagine moving the tip of ⃗1 on this plane. Then this locus corresponds to a circle that lies on the plane z = 1, has radius √2- and is centred at (0,0,1). That is, we draw the circle x2 + y2 = 1 on the xy-plane and then lifted it up by so that it lies on the plane z = 1. Thus, note that the xz-plane cuts this circle at two points. These two points of intersections give us the two choices for the vector v (see Figure 8.1). A similar calculation can be done for the yz-plane.


PICT PICT DRAFT

PIC

Figure 8.1: Geometry of plane rotations in 3

.


7.
In general, in n, suppose that we want to apply plane rotation to a along the x1x2-plane so that the resulting vector has 0 in the 2-nd coordinate. In that case, our circle on x1x2-plane has radius r = ∘  -------
   a21 + a22 and it gets translated by [                  ]
 0, 0,  a3, ⋅⋅⋅  anT . So, there are two points x on this circle with x2 = 0 and they are [                   ]
 ±r,  0, a3,  ⋅⋅⋅ anT .
8.
Consider three mutually orthogonal unit vectors, say x,y,z. Then, x can be brought to e1 by two plane rotations, namely by an appropriate U(θ1;1,3) and U(θ2;1,2). Thus,
U(θ2;1,2)U (θ1;1,3)x = e1.
In this process, the unit vectors y and z, get shifted to say,
ˆy = U (θ2;1,2)U (θ1;1,3)y  and ˆz = U (θ2;1,2)U(θ1;1,3)z.
As unitary transformations preserve angles, note that ˆy(1) = ˆz(1) = 0. Now, we can apply an appropriate plane rotation U(θ3;2,3) so that U(θ3;2,3)ˆy = e2. Since e3 is the only unit vector in 3 orthogonal to both e1 and e2, it follows that U(θ3;2,3)ˆz = e3. Thus,
    [          ]                              [        ]
I =  e1  e2  e3  = U(θ3;2,3)U (θ2;1,2)U (θ1;1,3) x  y  z  .
Hence, any real orthogonal matrix A M3() is a product of three plane rotations.

We are now ready to give another method to get the QR-decomposition of a square matrix (see Theorem 5.2.1 that uses the Gram-Schmidt Orthonormalization Process).

Proposition 8.2.7. [QR Factorization Revisited: Square Matrix] Let A Mn(). Then there exists a real orthogonal matrix Q and an upper triangular matrix R such that A = QR.

Proof. We start by applying the plane rotations to A so that the positions (2,1),(3,1),,(n,1) of A become zero. This means, if a21 = 0, we multiply by I. Otherwise, we use the plane rotation U(θ;1,2), where θ = cot-1(-a11∕a21). Then, we apply a similar technique to A so that the (3,1) entry of A becomes 0. Note that this plane rotation doesn’t change the (2,1) entry of A. We continue this process till all the entry in the first column of A, except possibly the (1,1) entry, is zero.

We then apply the plane rotations to make positions (3,2),(4,2),,(n,2) zero. Observe that this does not disturb the zeros in the first column. Thus, continuing the above process a finite number of times give us the required result. _

PICT PICT DRAFT Lemma 8.2.8. [QR Factorization Revisited: Rectangular Matrix] Let A Mm,n(). Then there exists a real orthogonal matrix Q and a matrix R Mm,n() in upper triangular form such that A = QR.

Proof. If RankA < m, add some columns to A to get a matrix, say à such that Rankà = m. Now suppose that à has k columns. For 1 i k, let vi = Ã[:,i]. Now, apply the Gram-Schmidt Orthonormalization Process to {v1,,vk}. For example, suppose the result is a sequence of k vectors w1,0,w2,0,0,,0,wm,0,,0, where Q = [            ]
 w1  ⋅⋅⋅  wm is real orthogonal. Then Ã[:,1] is a linear combination of w1, Ã[:,2] is also a linear combination of w1, Ã[:,3] is a linear combination of w1,w2 and so on. In general, for 1 s k, the column Ã[:,s] is a linear combination of wi-s in the list that appear up to the s-th position. Thus, Ã[:,s] = i=1mwiris, where ris = 0 for all i > s. That is, à = QR, where R = [rij]. Now, remove the extra columns of à and the corresponding columns in R to get the required result. _

Note that Proposition 8.2.7 is also valid for any complex matrix. In this case the matrix Q will be unitary. This can also be seen from Theorem 5.2.1 as we need to apply the Gram-Schmidt Orthonormalization Process to vectors in n.

To proceed further recall that a matrix A = [aij] Mn() is called a tri-diagonal matrix if aij = 0, whenever |i - j| > 1,1 i,j n.

Proposition 8.2.9. [Tridiagonalization of a Real Symmetric Matrix: Given’s Method] Let A be a real symmetric. Then, there exists a real orthogonal matrix Q such that QAQT is a tri-diagonal matrix.

Proof. If a310, then put U1 = U(θ1;2,3), where θ1 = cot-1(-a21∕a31). Notice that U1T [:,1] = e1 and so

(U1AU  T1 )[:,1] = (U1A )[:,1].
We already know that U1A[3,1] = 0. Hence, U1AU1T is a real symmetric matrix with (3,1)-th entry 0. Now, proceed to make the (4,1)-th entry of U1A equal to 0. To do so, take U2 = U(θ2;2,4). Notice that U2T (:,1) = e1 and so PICT PICT DRAFT
(U2 (U1AU  T1 )U2T)[:,1) = (U2U1AU T1 )[:,1].
But by our choice of the plane rotation U2, we have U2(U1AU1T )(4,1) = 0. Furthermore, as U2[3,:] = e3T , we have
         T                    T              T
(U2U1AU  1 )[3,1] = U2[3,:](U1AU 1 )[:,1] = (U1AU 1 )[3,1] = 0.
That is, the previous zeros are preserved.

Continuing this way, we can find a real orthogonal matrix Q such that QAQT is tri-diagonal. _

Proposition 8.2.10. [Almost Diagonalization of a Real Symmetric Matrix: Jacobi method] Let A Mn() be real symmetric. Then there exists a real orthogonal matrix S, a product of plane rotations, such that SAST is almost a diagonal matrix.

Proof. The idea is to reduce the off-diagonal entries of A to 0 as much as possible. So, we start with choosing ij) such that i < j and |aij| is maximum. Now, put

θ = 1-cot-1 aii --ajj, U = U (θ;i,j),   and   B = U TAU.
    2        2aij
Then, for all l,ki,j, we see that PICT PICT DRAFT
         T               T
blk  =   U  [l,:]AU [:,k] = el Aek = alk
bik  =   UT [i,:]AU [:,k ] = (cosθeT + sin θeT)Aek = aikcosθ + ajksinθ
                              i        j
blj  =   UT [l,:]AU [:,j] = eTl A (- sinθei + cosθej) = - alisinθ + alj cosθ
         T                    T        T
bij  =   U  [i,:]AaUjj[-:,ajii] = (cos θei + sin θej )A(- sin θei + cosθej)
    =   sin(2θ)---2--+  aij cos(2θ) = 0
Thus, using the above, we see that whenever l,ki,j, alk2 = blk2 and for li,j, we have
b2il + b2lj = a2il + a2lj.
As U is unitary and B = UT AU, we get |aij|2 = |bij|2. Further, bij = 0 implies that
a2ii + 2a2ij + a2jj = b2ii + 2b2ij + b2jj = b2ii + b2jj.
As the rest of the diagonal entries have not changed, we observe that the sum of the squares of the off-diagonal entries have reduced by 2aij2. Thus, a repeated application of the above process makes the matrix “close to diagonal”. _

8.2.2 Householder Matrices

We will now look at another class of unitary matrices, commonly called the Householder matrices (see Exercise 1.3.7.11).

PICT PICT DRAFT Definition 8.2.11. [Householder Matrix] Let w n be a unit vector. Then, the matrix Uw = I - 2ww* is called a Householder matrix.

Remark 8.2.12. We observe the following about the Householder matrix Uw.

1.
Uw = I - 2ww* is the sum of two Hermitian matrices and hence is also Hermitian.
2.
UwUw* = (I - 2ww*)(I - 2ww*)T = I - 2ww*- 2ww* + 4ww* = I. Or equivalently, verify that Uwx= x, for all x n. So Uw is unitary.
3.
If x w then Uwx = x.
4.
If x = cw, for some cin, then Uwx = -x.
5.
Thus, if v n then we know that v = x + y, where x w and y = cw, for some c . In this case, Uwv = Uw(x + y) = x - y.
6.
Geometrically, Uwv reflects the vector v along the vector w. Thus, Uw is a reflection matrix along w (see Exercise 1.3.7.??).

Example 8.2.13. In 2, let w = e2. Then w is the x-axis. The vector v = [ ]
 1
 2 = e1 + 2e2, where e1 w and 2e2 LS(w). So

PICT PICT DRAFT Uw  (e1 + 2e2) = Uwv =  Uw (x+  y) = x - y = e1 - 2e2.
That is, the reflection of v along the x-axis (w).

Recall that if x,y n with xy and x= ythen, (x + y) (x - y). This is not true in n as can be seen from the following example. Take x = [ ]
 1
 1 and y = [   ]
   i
 - 1. Then [     ]
  1+ i

     0,[     ]
  1- i

     2= (1 + i)20. Thus, to pick the right choice for the matrix Uw, we need to be observant of the choice of the inner product space.

Example 8.2.14. Let x,y n with xy and x= y. Then, which U

wshouldbeusedtoreflectytox? Solution in case of n: Imagine the line segment joining x and y. Now, place a mirror at the midpoint and perpendicular to the line segment. Then, the reflection of y on that mirror is x. So, take w = -x-y-
∥x-y∥n. Then,

                                             x - y
Uwy   =   (I - 2wwT )y = y - 2wwT  y = y - 2-------2(x - y)Ty
                                            ∥x - y∥
               --x--y---- ∥x---y∥2
      =   y - 2∥x - y∥2     2     = x.
Solution in case of n: Suppose there is a unit vector w n such that (I - 2ww*)y = x. Then y - x = 2ww*y and hence w*(y - x) = 2w*ww*y = 2w*y. Thus,
PICT PICT DRAFT
  *
w  (y + x ) = 0, that is, w ⊥ (y + x ).
(8.2.1)

Furthermore, again using w*(y + x) = 0, we get y - x = 2ww*y = -2ww*x. So,

               *                      *
2(y - x) = 2ww  (y - x) or y - x = ww (y - x).
On the other hand, using Equation (8.2.1), we get ww*(y + x) = 0. So,
           *   *                 *    *                 *
0 = [(y + x) ww  ](y-  x) = (y + x) [ww (y - x)] = (y + x) (y- x ).
Therefore, if such a w exists, then (y - x) (y + x).

But, in that case, w = -x--y-
∥x- y∥ will work as using above x - y2 = 2(y - x)*y and

Uwy   =   (I - 2ww *)y = y - 2ww  *y = y-  2-x---y--(x - y)*y
                                           ∥x - y∥2
                x - y  - ∥x - y∥2
      =   y- 2 ∥x--y-∥2-----2---- = x.

Thus, in this case, if x + y,x - y0 then we will not find a w such that Uwy = x.

For example, taking x = [ ]
 1
 1 and y = [   ]
   i
 - 1, we have x + y,x - y0.

As an application, we now prove that any real symmetric matrix can be transformed into a tri-diagonal matrix.

Proposition 8.2.15. [Householder’s Tri-Diagonalization] Let v n-1 and A = [      ]
 a  vT
 v   B Mn() be a real symmetric matrix. Then, there exists a real orthogonal matrix Q, a product of Householder matrices, such that QT AQ is tri-diagonal.

Proof. If v = e1 then we proceed to apply our technique to the matrix B, a matrix of lower order. So, without loss of generality, we assume that ve1.

As we want QT AQ to be tri-diagonal, we need to find a vector w n-1 such that Uwv = re1 n-1, where r = v= Uwv. Thus, using Example 8.2.14, choose the required vector w n-1. Then,

                                              ⌊        ⌋
[      ][      ][      ]   [              ]     a  r  0    [        ]
 1    0   a  vT   1   0  =      a    vTU Tw  = |        | =    a  reT1 ,
 0  Uw   v   B    0  UTw      Uwv   UwBU  Tw    ⌈ r  *  *⌉    re1    S
                                                0  *  *
where S Mn-1() is a symmetric matrix. Now, use induction on the matrix S to get the required result. _ PICT PICT DRAFT

8.2.3 Schur’s Upper Triangularization Revisited

Definition 8.2.16. Let s and t be two symbols. Then, an expression of the form

W  (s,t) = sm1tn1 ...smktnk where mi, ni are positive integers
is called a word in symbols s and t of degree i=1k(mi + ni).

Remark 8.2.17. [More on Unitary Equivalence] Let s and t be two symbols and W(s,t) be a word in symbols s and t.

1.
Suppose U is a unitary matrix such that B = U*AU. Then, W(A,A*) = U*W(B,B*)U. Thus, tr[W(A,A*)] = tr[W(B,B*)].
2.
Let A and B be two matrices such that tr[W(A,A*)] = tr[W(B,B*)], for each word W. Then, does it imply that A and B are unitarily equivalent? The answer is ‘yes’ as provided by the following result. The proof is outside the scope of this book.

Theorem 8.2.18. [Specht-Pearcy] Let A,B Mn() and suppose that tr[W(A,A*)] = tr[W(B,B*)] holds for all words of degree less than or equal to 2n2. Then B = U*AU, for some unitary matrix U. PICT PICT DRAFT

Exercise 8.2.19. [Triangularization via Complex Orthogonal Matrix need not be Possible] Let A Mn() and A = QTQT , where Q is complex orthogonal matrix and T is upper triangular. Then, prove that

1.
A has an eigenvector x such that xT x0.
2.
there is no orthogonal matrix Q such that QT [      ]
  1    i
  i  - 1Q is upper triangular.

Proposition 8.2.20. [Matrices with Distinct Eigenvalues are Dense in Mn()] Let A Mn(). Then, for each ϵ > 0, there exists a matrix A(ϵ) Mn() such that A(ϵ) = [a(ϵ)ij] has distinct eigenvalues and |aij - a(ϵ)ij|2 < ϵ.

Proof. By Schur Upper Triangularization (see Lemma 6.2.12), there exists a unitary matrix U such that U*AU = T, an upper triangular matrix. Now, choose αi’s such that tii + αi are distinct and |αi|2 < ϵ. Now, consider the matrix A(ϵ) = U(T +  diag (α1, ...,αn ))U*. Then, B = A(ϵ) - A = U diag(α1,n)]U* with

∑     2       *                  2        2  *   ∑     2
   |bij| = tr(B B ) = trU diag(|α1| ,...,|αn| )U   =    |αi| < ϵ.
i,j                                                i
Thus, the required result follows. _ PICT PICT DRAFT

Before proceeding with our next result on almost diagonalizability, we look at the following example.

Example 8.2.21. Let A = [    ]
 1  2
 0  3 and ϵ > 0 be given. Then, determine a diagonal matrix D such that the non-diagonal entry of D-1AD is less than ϵ.

Solution: Choose α < ϵ
--
2 and define D = diag(1). Then,

          [     ][     ][    ]   [      ]
D -1AD  =   1  0   1 2   1  0  =  1  2α  .
            0  1α-  0 3   0  α     0    3
As α < ϵ-
2, the required result follows.

Proposition 8.2.22. [A matrix is Almost Diagonalizable] Let A Mn() and ϵ > 0 be given. Then, there exists an invertible matrix Sϵ such that Sϵ-1ASϵ = T, an upper triangular matrix with |tij| < ϵ, for all ij.

Proof. By Schur Upper Triangularization (see Lemma 6.2.12), there exists a unitary matrix U such that U*AU = T, an upper triangular matrix. Now, take t = 2 + maxi<j|tij| and choose α such that 0 < α < ϵ∕t. Then, if we take Dα = diag(1,α,α2,n-1) and S = UDα, we have S-1AS = D1αTDα = F (say), an upper triangular. Furthermore, note that for i < j, we have |fij| = |tij|αj-i ϵ. Thus, the required result follows. _

8.3 Commuting Matrices and Simultaneous Diagonalization

PICT PICT DRAFT

Definition 8.3.1. [Simultaneously Diagonalizable] Let A,B Mn(). Then, they are said to be simultaneously diagonalizable if there exists an invertible matrix S such that S-1AS and S-1BS are both diagonal matrices.

Since diagonal matrices commute, we have our next result.

Proposition 8.3.2. Let A,B Mn(). If A and B are simultaneously diagonalizable then AB = BA.

Proof. By definition, there exists an invertible matrix S such that S-1AS = Λ1 and S-1BS = Λ2. Hence,

            -1        -1            -1          -1        -1     - 1
AB  = (SΛ1S   )⋅(SΛ2S   ) = SΛ1 Λ2S   = S Λ2Λ1S    = SΛ2S   S Λ1S   = BA.
Thus, we have proved the required result. _

Theorem 8.3.3. Let A,B Mn() be diagonalizable matrices. Then they are simultaneously diagonalizable if and only if they commute.

Proof. One part of this theorem has already been proved in Proposition 8.3.2. For the other part, let us assume that AB = BA. Since A is diagonalizable, there exists an invertible matrix S such that

PICT PICT DRAFT
S- 1AS  = Λ = λ1I ⊕ ⋅⋅⋅⊕ λkI,
(8.3.1)

where λ1,k are the distinct eigenvalues of A. We now use the sub-matrix structure of S-1AS to decompose C = S-1BS as C = ⌊             ⌋
 C11  ⋅⋅⋅  C1k
||      .      ||
⌈      ..     ⌉
 Ck1  ⋅⋅⋅  Ckk. Since AB = BA and S is invertible, we have ΛC = CΛ. Thus,

⌊ λ C    ⋅⋅⋅ λ  C  ⌋   ⌊ λ C    ⋅⋅⋅ λ C   ⌋
|  1 11        1 1k|   |  1 11       k  1k|
|⌈        ...       |⌉ = |⌈        ...       |⌉.

  λkCk1  ⋅⋅⋅ λkCkk      λ1Ck1   ⋅⋅⋅ λkCkk
Since λiλj for 1 ij k, we have Cij = 0, whenever ij. Thus, the matrix C = C11 ⋅⋅⋅Ckk.

Since B is diagonalizable, the matrix C is also diagonalizable and hence the matrices Cii, for 1 i]lek, are diagonalizable. So, for 1 i k, there exists invertible matrices Ti’s such that Ti-1CiiTi = Λi. Put T = T1 ⋅⋅⋅Tk. Then,

              ⌊              ⌋ ⌊             ⌋⌊           ⌋   ⌊              ⌋
                T- 1            λ1I             T1              λ1I
 - 1 -1       ||  1    ..      || ||     .       ||||     .     ||   ||      .       ||
T   S  AST  = ⌈        .     ⌉ ⌈      ..     ⌉⌈      ..    ⌉ = ⌈       ..     ⌉
                          T- 1            λkI           Tk               λkI
                           k
and PICT PICT DRAFT
               ⌊  -1          ⌋⌊ C            ⌋⌊T          ⌋   ⌊ Λ          ⌋
               |T 1   .       ||  11          ||  1        |   |  1         |
T -1S -1BST  = |⌈       ..     |⌉|⌈      ...     |⌉|⌈     ...    |⌉ = |⌈     ...    |⌉.
                          T -1
                            k             Ckk            Tk              Λk
Thus A and B are simultaneously diagonalizable and the required result follows. _

Definition 8.3.4. [Commuting Family of Matrices]

1.
Let FMn(). Then F is said to be a commuting family if each pair of matrices in F commutes.
2.
Let B Mn() and W be a subspace of n. Then, W is said to be a B-invariant subspace if Bw W, for all w W (or equivalently, BW W).
3.
A subspace W of n is said to be F-invariant if W is B-invariant for each B F.

Example 8.3.5. Let A Mn() with (λ,x) as an eigenpair. Then, W = {cx : c } is an A-invariant subspace. Furthermore, if W is an A-invariant subspace with dim(W) = 1 then verify that any non-zero vector in W is an eigenvector of A.

Theorem 8.3.6. [An A-invariant Subspace Contains an Eigenvector of A] Let A Mn() and W n be an A-invariant subspace of dimension at least 1. Then W contains an eigenvector of A. PICT PICT DRAFT

Proof. Let B = {f1,,fk}⊆ n be an ordered basis for W. Define T : W W as Tv = Av. Then T[B,B] = [                 ]
 [Tf1]B  ⋅⋅⋅  [T fk]B is a k × k matrix which satisfies [Tw]B = T[B,B][w]B, for all w W. As T[B,B] Mk(), it has an eigenpair, say (λ,ˆx) with ˆx k. That is,

T [B, B]ˆx = λˆx.
(8.3.2)

Now, put x = i=1k(xˆ)ifi n. Then, verify that x W and [x]B = ˆx. Thus, Tx W and now using Equation (8.3.2), we get

     ∑ k             ∑k                  ∑k                ∑k           ∑k
Tx =     ([T x]B)ifi =    (T[B,B ][x]B)ifi =   (T [B, B]ˆx)ifi =    (λˆx)ifi = λ   (ˆx)ifi = λx.
      i=1             i=1                  i=1               i=1          i=1
So, A has an eigenvector x W corresponding to the eigenvalue λ. _

Theorem 8.3.7. Let FMn() be a commuting family of matrices. Then, all the matrices in F have a common eigenvector.

Proof. Note that n is F-invariant. Let W n be F-invariant with minimum positive dimension. Let y W such that y0. We claim that y is an eigenvector, for each A F. PICT PICT DRAFT

So, on the contrary assume y is not an eigenvector for some A F. Then, by Theorem 8.3.6, W contains an eigenvector x of A for some eigenvalue, say λ. Define W0 = {z W : Az = λz}. So W0 is a proper subspace of W as y W \ W0. Also, for z W0 and C F, we note that A(Cz) = CAz = λ(Cz), so that Cz W0. So W0 is F-invariant and 1 dimW0 < dimW, a contradiction. _

Theorem 8.3.8. Let FMn() be a family of diagonalizable matrices. Then F is commuting if and only if F is simultaneously diagonalizable.

Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let us assume the result to be valid for all n < m. Now, let us assume that FMm() is a family of diagonalizable matrices.

If F is simultaneously diagonalizable, then by Proposition 8.3.2, the family F is commuting. Conversely, let F be a commuting family. If each A F is a scalar matrix then they are simultaneously diagonalizable via I. So, let A F be a non-scalar matrix. As A is diagonalizable, there exists an invertible matrix S such that

S- 1AS = λ1I ⊕ ⋅⋅⋅⊕ λkI,k ≥ 2,
where λi’s are distinct. Now, consider the family G = {Xˆ = S-1XSX F}. As F is a commuting family, the set G is also a commuting family. So, each Xˆ G has the form Xˆ = X1 ⋅⋅⋅Xk. Note that Hi = {Xi ˆ
X G} is a commuting family of diagonalizable matrices of size < m. Thus, by induction hypothesis, Hi’s are simultaneously diagonalizable, say by the invertible matrices Ti’s. That is, Ti-1XiTi = Λi, a diagonal matrix, for 1 i k. Thus, if T = T1 ⋅⋅⋅Tk then
T -1S- 1X ˆST  = T -1(X1 ⊕ ⋅⋅⋅⊕ Xk)T =  T-11X1T1 ⊕ ⋅⋅⋅⊕ T -k1XkTk  = Λ1 ⊕ ⋅⋅⋅⊕ Λk,
a diagonal matrix, for all X F. Thus the result holds by induction. _

We now give prove of some parts of Exercise 6.1.24.exe:eigen:1.

Remark 8.3.9. [σ(AB) and σ(BA)] Let m n, A Mm×n(), and B Mn×m(). Then σ(BA) = σ(AB) with n - m extra 0’s. In particular, if A,B Mn() then, PAB(t) = PBA(t).

Proof. Note that

[      ] [      ]   [          ]   [       ][       ]
 AB   0   Im  A      AB    ABA       Im   A    0   0
                  =              =                   .
  B   0    0  In      B    BA        0   In  B   BA
Thus, the matrices [       ]
  AB   0
  B    0 and [       ]
  0   0
 B   BA are similar. Hence, AB and BA have precisely the same non-zero eigenvalues. Therefore, if they have the same size, they must have the same characteristic polynomial. _

Exercise 8.3.10. [Miscellaneous Exercises]

1.
Let A be nonsingular. Then, verify that A-1(AB)A = BA. Hence, AB and BA are similar. Thus, PAB(t) = PBA(t).
2.
Fix a positive integer k,0 k n. Now, define the function fk : Mn() by f(A) = coefficient of tk in PA(t). Prove that fk is a continuous function. PICT PICT DRAFT
3.
For any matrix A, prove that there exists an ϵ > 0 such that Aα = A + αI is invertible, for all α (0). Thus, use the first part to conclude that for any given B, we have PAαB(t) = PBAα(t), for all α (0).
4.
Now, use continuity to argue that PAB(t) = limα0+PAαB(t) = limα0+PBAα(t) = PBA(t).
5.
Let σ(A) = {λ1,n}, σ(B) = {μ1,n} and suppose that AB = BA. Then,
(a)
prove that there is a permutation π such that σ(A + B) = {λ1 + μπ(1),n + μπ(n)}. In particular, σ(A + B) σ(A) + σ(B).
(b)
if we further assume that σ(A) σ(-B) = then the matrix A + B is nonsingular.
6.
Let A and B be two non-commuting matrices. Then, give an example to show that it is difficult to relate σ(A + B) with σ(A) and σ(B).
7.
Are the matrices A = ⌊         ⌋
 0  1    0
|         |
⌈0  0  - 1⌉
 0  0    0 and B = ⌊       ⌋
 0  0  0
|       |
⌈1  0  0⌉
 0  1  0 simultaneously triangularizable?
8.
Let FMn() be a family of commuting normal matrices. Then, prove that each element of F is simultaneously unitarily diagonalizable.
9.
Let A Mn() with A* = A and x*Ax 0, for all x n. Then prove that σ(A) + and if tr(A) = 0, then A = 0.

8.3.1 Diagonalization and Real Orthogonal Matrix

Proposition 8.3.11. [Triangularization: Real Matrix] Let A Mn(). Then, there exists a real orthogonal matrix Q such that QT AQ is block upper triangular, where each diagonal block is of size either 1 or 2. PICT PICT DRAFT

Proof. If all the eigenvalues of A are real then the corresponding eigenvectors have real entries and hence, one can use induction to get the result in this case (see Lemma 6.2.12).

So, now let us assume that A has a complex eigenvalue, say λ = α + with β0 and x = u + iv as an eigenvector for λ. Thus, Ax = λx and hence Ax = λx. But, λλ as β0. Thus, the eigenvectors x,x are linearly independent and therefore, {u,v} is a linearly independent set. By Gram-Schmidt Orthonormalization process, we get an ordered basis, say {w1,w2,,wn} of n, where LS(w1,w2) = LS(u,v). Also, using the eigen-condition Ax = λx gives

Aw1  = aw1 + bβw2, Aw2  = cw1 + dw2,
for some real numbers a,b,c and d.

Now, form a matrix X = [w1, w2,...,wn ]. Then, X is a real orthogonal matrix and

                                     ⌊   ⌋
                                      w *1
                                     |  *|
X *AX   =   X *[Aw  ,Aw  ,...,Aw  ] = ||w 2|| [aw   + bw ,cw  + dw  ,...,Aw  ]
                  1     2       n    |⌈ ... |⌉    1     2    1     2        n
                                        *
                                      w n
            ⌊ a  b     ⌋
            |        * |
        =   ⌈ c  d     ⌉                                                  (8.3.3)
                0    B
where B Mn-2(). Now, by induction hypothesis the required result follows. _

The next result is a direct application of Proposition 8.3.11 and hence the proof is omitted.

PICT PICT DRAFT Corollary 8.3.12. [Simultaneous Triangularization: Real Matrices] Let FMn() be a commuting family. Then, there exists a real orthogonal matrix Q such that QT AQ is a block upper triangular matrix, where each diagonal block is of size either 1 or 2, for all A F.

Proposition 8.3.13. Let A Mn(). Then the following statements are equivalent.

1.
A is normal.
2.
There exists a real orthogonal matrix Q such that QT AQ = iAi, where Ai’s are real normal matrices of size either 1 or 2.

Proof. 2 1 is trivial. To prove 1 2, recall that Proposition 8.3.11 gives the existence of a real orthogonal matrix Q such that QT AQ is upper triangular with diagonal blocks of size either 1 or 2. So, we can write

         ⌊           |             ⌋
         |λ1  *.   *  |*     *   *  |
         ||0    .. *  |*     *   *  ||    [     ]
         ||0   ⋅⋅⋅ λp |*     *   *  ||     R  C
QT AQ =  ||-----------|-------------|| =           (say).
         |0   ⋅⋅⋅ 0  |A11   ⋅.⋅⋅ A1k|     0  B
         |⌈0   ⋅⋅⋅ 0  |0      ..  *  |⌉
          0   ⋅⋅⋅ 0  |0     ⋅⋅⋅ Akk
As A is normal, [      ]
  R  C

  0  B[        ]
 RT   0
   T    T
 C    B = [        ]
 RT   0
  T    T
 C    B[      ]
 R   C

 0   B. Thus, tr(CCT ) = tr(RRT -RT R) = 0. Now, using Exercise 8.3.10.9, we get C = 0. Hence, RRT = RT R and therefore, R is a diagonal matrix. PICT PICT DRAFT

As BT B = BBT , we have A1iA1iT = A11A11T . So tr( 2kA1iA1iT ) = 0. Now, using Exercise 8.3.10.9 again, we have 2kA1iA1iT = 0 and so A1iA1iT = 0, for all i = 2,,k. Thus, A1i = 0, for all i = 2,,k. Hence, the required result follows. _

Exercise 8.3.14. Let A Mn(). Then the following are true.

1.
A = -AT if and only if A is real orthogonally similar to [ j0] [ iAi], where Ai = [       ]
  0   a
       i
 - ai 0, for some real numbers ai’s.
2.
AAT = I if and only if A is real orthogonally similar to [ iλi][ jAj], where λi = ±1 and Aj = [ cosθ    sin θ ]
      j       j
 - sinθj  cosθj, for some real numbers θi’s.

8.3.2 Convergent and nilpotent matrices

Definition 8.3.15. [Convergent matrices] A matrix A is called a convergent matrix if Am 0 as m →∞.

Remark 8.3.16.

1.
Let A be a diagonalizable matrix with ρ(A) < 1. Then, A is a convergent matrix.

Proof. Let A = U* diag(λ1,n)U. As ρ(A) < 1, for each i,1 i n, λim 0 as m →∞. Thus, Am = U* diag(λ1m,nm)U 0. _ PICT PICT DRAFT

2.
Even if the matrix A is not diagonalizable, the above result holds. That is, whenever ρ(A) < 1, the matrix A is convergent. The converse is also true.

Proof. Let Jk(λ) = λIk + Nk be a Jordan block of J = Jordan CFA. Then as Nkk = 0, for each fixed k, we have

Jk(λ)m =  λm + C (m, 1)λm -1Nk + ⋅⋅⋅+ C (m, k - 1)λm- k+1Nkk- 1→  0, as m → ∞.
As λm 0 as m →∞, the matrix Jk(λ)m 0 and hence J is convergent. Thus, A is a convergent matrix.

Conversely, if A is convergent, then J must be convergent. Thus each Jordan block Jk(λ) must be convergent. Hence |λ| < 1. _

Theorem 8.3.17. [Decomposition into Diagonalizable and Nilpotent Parts] Let A Mn(). Then A = B + C, where B is diagonalizable matrix and C is nilpotent such that BC = CB.

Proof. Let J = Jordan CFA. Then, J = D + N, where D = diag(J) and N is clearly a nilpotent matrix.

Now, note that DN = ND as for each Jordan block Jk(λ) = Dk + Nk, we have Dk = λI and Nk = Jk(0) so that DkNk = NkDk. As J = Jordan CFA, there exists an invertible matrix S, such that S-1AS = J. Hence, A = SJS-1 = SDS-1 + SNS-1 = B + C, which satisfy the required conditions. _ PICT PICT DRAFT PICT PICT DRAFT PICT PICT DRAFT