8 Advanced Topics on Diagonalizability and Triangularization*

We start this subsection with a few definitions and examples. So, it will be nice to recall the notations used in Section 1.3.1 and a few results from Appendix 9.2.

Definition 8.1.1. [Principal Minor] Let A ∈ M_n(ℂ).

1.: Also, let S ⊆ [n]. Then, det is called the Principal minor of A corresponding to S.
2.: By EM_k(A), we denote the sum of all k × k principal minors of A.

Definition 8.1.2. [Elementary Symmetric Functions] Let k be a positive integer. Then, the kth elementary symmetric function of the numbers r₁,…,r_n is S_k(r₁,…,r_n) and is defined as

∑ Sk(r1,...,rn) = ri1 ⋅⋅⋅ri . i1<⋅⋅⋅<ik k

PICT PICT DRAFT Example 8.1.3. Let A = ⌊ ⌋
1 2 3 4
|| 5 6 7 8||
|| ||
⌈ 9 8 7 6⌉
5 4 3 2 . Then, note that

1.: EM₁(A) = 1 + 6 + 7 + 2 = 16 and EM₂(A) = detA({1,2},{1,2}) + detA({1,3},{1,3}) + detA({1,4},{1,4}) + detA({2,3},{2,3}) + detA({2,4},{2,4}) + detA({3,4},{3,4}) = -80.
2.: S₁(1,2,3,4) = 10 and S₂(1,2,3,4) = 1 ⋅ (2 + 3 + 4) + 2 ⋅ (3 + 4) + 3 ⋅ 4 = 9 + 14 + 12 = 35.

Theorem 8.1.4. Let A ∈ M_n(ℂ) and let σ(A) = {λ₁,…,λ_n}. Then,

1.

the coefficient of t^n-k in P_A(t) = ∏ _i=1ⁿ(t - λ_i), the characteristic polynomial of A, is

∑ (- 1)k λi ⋅⋅⋅λi = (- 1)kSk(λ1,...,λn). i1<⋅⋅⋅<i 1 k k

(8.1.1)

2.

EM_k(A) = S_k(λ₁,…,λ_n).

For all i ∈ S, consider all permutations σ such that σ(i) = i. Our idea is to select a ‘t’ from these b_iσ(i). Since we do not want any more ‘t’, we set t = 0 for any other diagonal position. So the contribution from S to the coefficient of t^n-k is det[-A(S|S)] = (-1)^k detA(S|S). Hence the coefficient of t^n-k in P_A(t) is

Corollary 8.1.5. Let A ∈ M_n(ℂ) and let σ(A) = {λ₁,…,λ_n}. Then Tr(A) = ∑ ₁ⁿλ_i and detA = ∏ ₁ⁿλ_i.

Let A and B be similar matrices. Then, by Theorem 6.1.20, we know that σ(A) = σ(B). Thus, as a direct consequence of Part 2 of Theorem 8.1.4 gives the following result.

Corollary 8.1.6. Let A and B be two similar matrices of order n. Then, EM_k(A) = EM_k(B) for 1 ≤ k ≤ n.

So, the sum of principal minors of similar matrices are equal. Or in other words, the sum of principal minors are invariant under similarity.

Corollary 8.1.7. [Derivative of Characteristic Polynomial] Let A ∈ M_n(ℂ). Then

∑n -dPA (t) = P ′(t) = P (t). dt A i=1 A(i|i)

Proof. For 1 ≤ i ≤ n, let us denote A(i|i) by A_i. Then, using Equation (8.1.3), we have

Corollary 8.1.8. Let A ∈ M_n(ℂ). If Alg.Mul_α(A) = 1 then Rank[A - λI] = n - 1.

Proof. As Alg.Mul_α(A) = 1, P_A(t) = (t - λ)q(t), where q(t) is a polynomial with q(λ)≠0. Thus P′_A(t) = q(t) + (t - λ)q′(t). Hence, P′_A(λ) = q(λ)≠0. Thus, by Corollary 8.1.7, ∑ _iP_A(i|i)(λ) = P′_A(λ)≠0. Hence, there exists i,1 ≤ i ≤ n such that P_A(i|i)(λ)≠0. That is, det[A(i|i) - λI]≠0 or Rank[A - λI] = n - 1. _

Remark 8.1.9. Converse of Corollary 8.1.8 is false. Note that for the matrix A = [ ]
0 1
0 0 , Rank[A - 0I] = 1 = 2 - 1 = n - 1, but 0 has multiplicity 2 as a root of P_A(t) = 0.

We now relate the multiplicity of an eigenvalue with the spectrum of a principal sub-matrix.

Theorem 8.1.10. [Multiplicity and Spectrum of a Principal Sub-Matrix] Let A ∈ M_n(ℂ) and k be a positive integer. Then 1 ⇒2 ⇒3, where

1.: Geo.Mul_λ(A) ≥ k.
2.: If B is a principal sub-matrix of A of size m > n - k then λ ∈ σ(B).
3.: Alg.Mul_λ(A) ≥ k.

DRAFT

Proof. Part 1⇒ Part 2. Let {x₁,…,x_k} be linearly independent eigenvectors for λ and let B be a principal sub-matrix of A of size m > n-k. Without loss, we may write A = [ ]
B *
* *

. Let us partition the x_i’s , say x_i = [ ]
x
i1
xi2

, such that

Part 2⇒ Part 3. By Corollary 8.1.7, we know that P′_A(t) = ∑ _i=1ⁿP_A(i|i)(t). As A(i|i) is of size n - 1, we get P_A(i|i)(λ) = 0, for all i = 1,2,…,n. Thus, P′_A(λ) = 0. A similar argument now applied to each of the A(i|i)’s, gives P_A⁽²⁾(λ) = 0, where P_A⁽²⁾(t) = d
--
dt

P′_A(t). Proceeding on above lines, we finally get P_A⁽ⁱ⁾(λ) = 0, for i = 0,1,…,k - 1. This implies that Alg.Mul_λ(A) ≥ k. _

Definition 8.1.11. [Moments] Fix a positive integer n and let α₁,…,α_n be n complex numbers. Then, for a positive integer k, the sum ∑ _i=1ⁿα_i^k is called the k-th moment of the numbers α₁,…,α_n.

Theorem 8.1.12. [Newton’s identities] Let P(t) = tⁿ + a_n-1t^n-1 + ⋅⋅⋅ + a₀ have zeros λ₁,…,λ_n, counted with multiplicities. Put μ_k = ∑ _i=1ⁿλ_i^k. Then, for 1 ≤ k ≤ n,

DRAFT

kan- k + μ1an- k+1 + ⋅⋅⋅+ μk- 1an-1 + μk = 0.

(8.1.4)

That is, the first n moments of the zeros determine the coefficients of P(t).

Proof. For simplicity of expression, let a_n = 1. Then, using Equation (8.1.4), we see that k = 1 gives us a_n-1 = -μ₁. To compute a_n-2, put k = 2 in Equation (8.1.4) to verify that a_n-2 = -μ2+μ21-
2

. This process can be continued to get all the coefficients of P(t). Now, let us prove the n given equations.

Define f(t) = ∑ _i -1--
t- λi

and take |t| > max_i|λ_i|. Then, the left hand side can be re-written as

Remark 8.1.13. Let P(t) = a_ntⁿ + ⋅⋅⋅ + a₁t + a₀ with a_n = 1. Thus, we see that we need not find the zeros of P(t) to find the k-th moments of the zeros of P(t). It can directly be computed recursively using the Newton’s identities.

Exercise 8.1.14. Let A,B ∈ M_n(ℂ). Then, prove that A and B have the same eigenvalues if and only if tr(A^k) = tr(B^k), for k = 1,…,n. (Use Exercise 6.1.8 1a).

8.2 Methods for Tridiagonalization and Diagonalization

Thus, the set

forma a group with respect to multiplication. We now define this group.

PICT PICT DRAFT Definition 8.2.1. [Unitary Group] Let = {A ∈ M_n(ℂ) : A^*A = I}. Then, forms a multiplicative group. This group is called the unitary group.

Proposition 8.2.2. [Selection Principle of Unitary Matrices] Let {U_k : k ≥ 1} be a sequence of unitary matrices. Viewing them as elements of ℂ^n², let us assume that “for any ϵ > 0, there exists a positive integer N such that ∥U_k - U∥ < ϵ, for all k ≥ N”. That is, the matrices U_k’s converge to U as elements in ℂ^n². Then, U is also a unitary matrix.

Proof. Let A = [a_ij] ∈ M_n(ℂ) be an unitary matrix. Then ∑ _i,j=1ⁿ|a_ij|² = tr(A^*A) = n. Thus, the set of unitary matrices is a compact subset of ℂ^n². Hence, any sequence of unitary matrices has a convergent subsequence (Bolzano-Weierstrass Theorem), whose limit is again unitary. Thus, the required result follows. _

For a unitary matrix U, we know that U^-1 = U^*. Our next result gives a necessary and sufficient condition on an invertible matrix A so that the matrix A^-1 is similar to A^*.

Theorem 8.2.3. [Generalizing a Unitary Matrix] Let A be an invertible matrix. Then A^-1 is similar to A^* if and only if there exists an invertible matrix B such that A = B^-1B^*.

We first show that there exists a nonsingular Hermitian H_θ such that A^-1 = H_θ^-1A^*H_θ, for some θ ∈ ℝ. PICT

DRAFT

To get our result, we finally choose B = β(αI - A^*)H₍θ₀) such that β≠0 and α = e^iγ

σ(A^*).

Exercise 8.2.4. Suppose that A is similar to a unitary matrix. Then, prove that A^-1 is similar to A^*. PICT PICT DRAFT

8.2.1 Plane Rotations

Definition 8.2.5. [Plane Rotations] For a fixed positive integer n, consider the vector space ℝⁿ with standard basis {e₁,…,e_n}. Also, for 1 ≤ i,j ≤ n, let E_i,j = e_ie_j^T. Then, for θ ∈ ℝ and 1 ≤ i,j ≤ n, a plane rotation, denoted U(θ;i,j), is defined as

U (θ;i,j) = I - Ei,i - Ej,j + [Ei,i + Ej,j]cosθ - Ei,j sinθ + Ej,isinθ.

That is, U(θ;i,j) = ⌊ ⌋
|1 |
| ... |
|| ||
|| cosθ - sinθ ||
|| ... ||
|| ||
|| sin θ cosθ ||
| .. |
⌈ . ⌉
1

i j , where the unmentioned diagonal entries are 1 and the unmentioned off-diagonal entries are 0.

Remark 8.2.6. Note the following about the matrix U(θ;i,j), where θ ∈ ℝ and 1 ≤ i,j ≤ n.

1.

U(θ;i,j) are orthogonal.

2.

Geometrically U(θ;i,j)x rotates x by the angle θ in the ij-plane. PICT

DRAFT

3.

Geometrically (U (θ;i,j))

^Tx rotates x by the angle -θ in the ij-plane.

4.

If y = U(θ;i,j)x then the coordinates of y are given by

(a): y_i = x_i cosθ - x_j sinθ,
(b): y_j = x_i sinθ + x_j cosθ, and
(c): for l≠i,j, y_l = x_l.

5.

Thus, for x ∈ ℝⁿ, the choice of θ for which y_j = 0, where y = U(θ;i,j)x equals

(a): θ = 0, whenever x_j = 0. That is, U(0;i,j) = I.
(b): θ = cot^-1, whenever x_j≠0.

6.

[Geometry] Imagine standing at 1 = (1,1,1)^T ∈ ℝ³. We want to apply a plane rotation U, so that v = U^T1 with v₂ = 0. That is, the final point is on the xz-plane.

Then, we can either apply a plane rotation along the xy-plane or the yz-plane. For the xy-plane, we need the plane z = 1 (xy plane lifted by 1). This plane contains the vector 1. Imagine moving the tip of on this plane. Then this locus corresponds to a circle that lies on the plane z = 1, has radius √2- and is centred at (0,0,1). That is, we draw the circle x² + y² = 1 on the xy-plane and then lifted it up by so that it lies on the plane z = 1. Thus, note that the xz-plane cuts this circle at two points. These two points of intersections give us the two choices for the vector v (see Figure 8.1). A similar calculation can be done for the yz-plane.

PICT PICT DRAFT

Figure 8.1: Geometry of plane rotations in ℝ³

7.

In general, in ℝⁿ, suppose that we want to apply plane rotation to a along the x₁x₂-plane so that the resulting vector has 0 in the 2-nd coordinate. In that case, our circle on x₁x₂-plane has radius r = ∘ -------
a21 + a22

and it gets translated by [ ]
0, 0, a3, ⋅⋅⋅ an

^T. So, there are two points x on this circle with x₂ = 0 and they are [ ]
±r, 0, a3, ⋅⋅⋅ an

^T.

8.

Consider three mutually orthogonal unit vectors, say x,y,z. Then, x can be brought to e₁ by two plane rotations, namely by an appropriate U(θ₁;1,3) and U(θ₂;1,2). Thus,

U(θ2;1,2)U (θ1;1,3)x = e1.

In this process, the unit vectors y and z, get shifted to say,

ˆy = U (θ2;1,2)U (θ1;1,3)y and ˆz = U (θ2;1,2)U(θ1;1,3)z.

As unitary transformations preserve angles, note that

(1) =

(1) = 0. Now, we can apply an appropriate plane rotation U(θ₃;2,3) so that U(θ₃;2,3)

= e₂. Since e₃ is the only unit vector in ℝ³ orthogonal to both e₁ and e₂, it follows that U(θ₃;2,3)

= e₃. Thus,

[ ] [ ] I = e1 e2 e3 = U(θ3;2,3)U (θ2;1,2)U (θ1;1,3) x y z .

Hence, any real orthogonal matrix A ∈ M₃(ℝ) is a product of three plane rotations.

We are now ready to give another method to get the QR-decomposition of a square matrix (see Theorem 5.2.1 that uses the Gram-Schmidt Orthonormalization Process).

Proposition 8.2.7. [QR Factorization Revisited: Square Matrix] Let A ∈ M_n(ℝ). Then there exists a real orthogonal matrix Q and an upper triangular matrix R such that A = QR.

Proof. We start by applying the plane rotations to A so that the positions (2,1),(3,1),…,(n,1) of A become zero. This means, if a₂₁ = 0, we multiply by I. Otherwise, we use the plane rotation U(θ;1,2), where θ = cot^-1(-a₁₁∕a₂₁). Then, we apply a similar technique to A so that the (3,1) entry of A becomes 0. Note that this plane rotation doesn’t change the (2,1) entry of A. We continue this process till all the entry in the first column of A, except possibly the (1,1) entry, is zero.

We then apply the plane rotations to make positions (3,2),(4,2),…,(n,2) zero. Observe that this does not disturb the zeros in the first column. Thus, continuing the above process a finite number of times give us the required result. _

PICT PICT DRAFT Lemma 8.2.8. [QR Factorization Revisited: Rectangular Matrix] Let A ∈ M_m,n(ℝ). Then there exists a real orthogonal matrix Q and a matrix R ∈ M_m,n(ℝ) in upper triangular form such that A = QR.

Proof. If RankA < m, add some columns to A to get a matrix, say Ã such that RankÃ = m. Now suppose that Ã has k columns. For 1 ≤ i ≤ k, let v_i = Ã[:,i]. Now, apply the Gram-Schmidt Orthonormalization Process to {v₁,…,v_k}. For example, suppose the result is a sequence of k vectors w₁,0,w₂,0,0,…,0,w_m,0,…,0, where Q = [ ]
w1 ⋅⋅⋅ wm

is real orthogonal. Then Ã[:,1] is a linear combination of w₁, Ã[:,2] is also a linear combination of w₁, Ã[:,3] is a linear combination of w₁,w₂ and so on. In general, for 1 ≤ s ≤ k, the column Ã[:,s] is a linear combination of w_i-s in the list that appear up to the s-th position. Thus, Ã[:,s] = ∑ _i=1^mw_ir_is, where r_is = 0 for all i > s. That is, Ã = QR, where R = [r_ij]. Now, remove the extra columns of Ã and the corresponding columns in R to get the required result. _

Note that Proposition 8.2.7 is also valid for any complex matrix. In this case the matrix Q will be unitary. This can also be seen from Theorem 5.2.1 as we need to apply the Gram-Schmidt Orthonormalization Process to vectors in ℂⁿ.

To proceed further recall that a matrix A = [a_ij] ∈ M_n(ℂ) is called a tri-diagonal matrix if a_ij = 0, whenever |i - j| > 1,1 ≤ i,j ≤ n.

Proposition 8.2.9. [Tridiagonalization of a Real Symmetric Matrix: Given’s Method] Let A be a real symmetric. Then, there exists a real orthogonal matrix Q such that QAQ^T is a tri-diagonal matrix.

Proof. If a₃₁≠0, then put U₁ = U(θ₁;2,3), where θ₁ = cot^-1(-a₂₁∕a₃₁). Notice that U₁^T[:,1] = e₁ and so

Continuing this way, we can find a real orthogonal matrix Q such that QAQ^T is tri-diagonal. _

Proposition 8.2.10. [Almost Diagonalization of a Real Symmetric Matrix: Jacobi method] Let A ∈ M_n(ℝ) be real symmetric. Then there exists a real orthogonal matrix S, a product of plane rotations, such that SAS^T is almost a diagonal matrix.

Proof. The idea is to reduce the off-diagonal entries of A to 0 as much as possible. So, we start with choosing i≠j) such that i < j and |a_ij| is maximum. Now, put

8.2.2 Householder Matrices

We will now look at another class of unitary matrices, commonly called the Householder matrices (see Exercise 1.3.7.11).

PICT PICT DRAFT Definition 8.2.11. [Householder Matrix] Let w ∈ ℂⁿ be a unit vector. Then, the matrix U_w = I - 2ww^* is called a Householder matrix.

Remark 8.2.12. We observe the following about the Householder matrix U_w.

1.: U_w = I - 2ww^* is the sum of two Hermitian matrices and hence is also Hermitian.
2.: U_wU_w^* = (I - 2ww^*)(I - 2ww^*)^T = I - 2ww^*- 2ww^* + 4ww^* = I. Or equivalently, verify that ∥U_wx∥ = ∥x∥, for all x ∈ ℂⁿ. So U_w is unitary.
3.: If x ∈ w^⊥ then U_wx = x.
4.: If x = cw, for some cinℂ, then U_wx = -x.
5.: Thus, if v ∈ ℂⁿ then we know that v = x + y, where x ∈ w^⊥ and y = cw, for some c ∈ ℂ. In this case, U_wv = U_w(x + y) = x - y.
6.: Geometrically, U_wv reflects the vector v along the vector w^⊥. Thus, U_w is a reflection matrix along w^⊥ (see Exercise 1.3.7.??).

Example 8.2.13. In ℝ², let w = e₂. Then w^⊥ is the x-axis. The vector v = [ ]
1
2 = e₁ + 2e₂, where e₁ ∈ w^⊥ and 2e₂ ∈ LS(w). So

DRAFT

That is, the reflection of v along the x-axis (w^⊥).

Recall that if x,y ∈ ℝⁿ with x≠y and ∥x∥ = ∥y∥ then, (x + y) ⊥ (x - y). This is not true in ℂⁿ as can be seen from the following example. Take x = [ ]
1
1

and y =

. Then ⟨

⟩ = (1 + i)²≠0. Thus, to pick the right choice for the matrix U_w, we need to be observant of the choice of the inner product space.

Example 8.2.14. Let x,y ∈ ℂⁿ with x≠y and ∥x∥ = ∥y∥. Then, which U

wshouldbeusedtoreflectytox? Solution in case of ℝⁿ: Imagine the line segment joining x and y. Now, place a mirror at the midpoint and perpendicular to the line segment. Then, the reflection of y on that mirror is x. So, take w = -x-y-
∥x-y∥ ∈ ℝⁿ. Then,

x - y Uwy = (I - 2wwT )y = y - 2wwT y = y - 2-------2(x - y)Ty ∥x - y∥ --x--y---- ∥x---y∥2 = y - 2∥x - y∥2 2 = x.

Solution in case of ℂⁿ: Suppose there is a unit vector w ∈ ℂⁿ such that (I - 2ww^*)y = x. Then y - x = 2ww^*y and hence w^*(y - x) = 2w^*ww^*y = 2w^*y. Thus,

DRAFT

* w (y + x ) = 0, that is, w ⊥ (y + x ).

(8.2.1)

Furthermore, again using w^*(y + x) = 0, we get y - x = 2ww^*y = -2ww^*x. So,

* * 2(y - x) = 2ww (y - x) or y - x = ww (y - x).

On the other hand, using Equation (8.2.1), we get ww^*(y + x) = 0. So,

* * * * * 0 = [(y + x) ww ](y- x) = (y + x) [ww (y - x)] = (y + x) (y- x ).

Therefore, if such a w exists, then (y - x) ⊥ (y + x).

But, in that case, w = -x--y-
∥x- y∥ will work as using above ∥x - y∥² = 2(y - x)^*y and

Uwy = (I - 2ww *)y = y - 2ww *y = y- 2-x---y--(x - y)*y ∥x - y∥2 x - y - ∥x - y∥2 = y- 2 ∥x--y-∥2-----2---- = x.

Thus, in this case, if ⟨x + y,x - y⟩≠0 then we will not find a w such that U_wy = x.

For example, taking x = [ ]
1
1 and y = [ ]
i
- 1 , we have ⟨x + y,x - y⟩≠0.

Proposition 8.2.15. [Householder’s Tri-Diagonalization] Let v ∈ ℝ^n-1 and A = [ ]
a vT
v B ∈ M_n(ℝ) be a real symmetric matrix. Then, there exists a real orthogonal matrix Q, a product of Householder matrices, such that Q^TAQ is tri-diagonal.

Proof. If v = e₁ then we proceed to apply our technique to the matrix B, a matrix of lower order. So, without loss of generality, we assume that v≠e₁.

As we want Q^TAQ to be tri-diagonal, we need to find a vector w ∈ ℝ^n-1 such that U_wv = re₁ ∈ ℝ^n-1, where r = ∥v∥ = ∥U_wv∥. Thus, using Example 8.2.14, choose the required vector w ∈ ℝ^n-1. Then,

8.2.3 Schur’s Upper Triangularization Revisited

Definition 8.2.16. Let s and t be two symbols. Then, an expression of the form

W (s,t) = sm1tn1 ...smktnk where mi, ni are positive integers

is called a word in symbols s and t of degree ∑ _i=1^k(m_i + n_i).

Remark 8.2.17. [More on Unitary Equivalence] Let s and t be two symbols and W(s,t) be a word in symbols s and t.

1.: Suppose U is a unitary matrix such that B = U^*AU. Then, W(A,A^*) = U^*W(B,B^*)U. Thus, tr[W(A,A^*)] = tr[W(B,B^*)].
2.: Let A and B be two matrices such that tr[W(A,A^*)] = tr[W(B,B^*)], for each word W. Then, does it imply that A and B are unitarily equivalent? The answer is ‘yes’ as provided by the following result. The proof is outside the scope of this book.

Theorem 8.2.18. [Specht-Pearcy] Let A,B ∈ M_n(ℂ) and suppose that tr[W(A,A^*)] = tr[W(B,B^*)] holds for all words of degree less than or equal to 2n². Then B = U^*AU, for some unitary matrix U. DRAFT

Exercise 8.2.19. [Triangularization via Complex Orthogonal Matrix need not be Possible] Let A ∈ M_n(ℂ) and A = QTQ^T, where Q is complex orthogonal matrix and T is upper triangular. Then, prove that

1.: A has an eigenvector x such that x^Tx≠0.
2.: there is no orthogonal matrix Q such that Q^TQ is upper triangular.

Proposition 8.2.20. [Matrices with Distinct Eigenvalues are Dense in M_n(ℂ)] Let A ∈ M_n(ℂ). Then, for each ϵ > 0, there exists a matrix A(ϵ) ∈ M_n(ℂ) such that A(ϵ) = [a(ϵ)_ij] has distinct eigenvalues and ∑ |a_ij - a(ϵ)_ij|² < ϵ.

Proof. By Schur Upper Triangularization (see Lemma 6.2.12), there exists a unitary matrix U such that U^*AU = T, an upper triangular matrix. Now, choose α_i’s such that t_ii + α_i are distinct and ∑ |α_i|² < ϵ. Now, consider the matrix A(ϵ) = U (T + diag (α1, ...,αn ))

U^*. Then, B = A(ϵ) - A = U diag(α₁,…,α_n)]U^* with

Before proceeding with our next result on almost diagonalizability, we look at the following example.

Example 8.2.21. Let A = [ ]
1 2
0 3 and ϵ > 0 be given. Then, determine a diagonal matrix D such that the non-diagonal entry of D^-1AD is less than ϵ.

Solution: Choose α < ϵ
--
2 and define D = diag(1,α). Then,

[ ][ ][ ] [ ] D -1AD = 1 0 1 2 1 0 = 1 2α . 0 1α- 0 3 0 α 0 3

As α <

, the required result follows.

Proposition 8.2.22. [A matrix is Almost Diagonalizable] Let A ∈ M_n(ℂ) and ϵ > 0 be given. Then, there exists an invertible matrix S_ϵ such that S_ϵ^-1AS_ϵ = T, an upper triangular matrix with |t_ij| < ϵ, for all i≠j.

Proof. By Schur Upper Triangularization (see Lemma 6.2.12), there exists a unitary matrix U such that U^*AU = T, an upper triangular matrix. Now, take t = 2 + max_i<j|t_ij| and choose α such that 0 < α < ϵ∕t. Then, if we take D_α = diag(1,α,α²,…,α^n-1) and S = UD_α, we have S^-1AS = DTD_α = F (say), an upper triangular. Furthermore, note that for i < j, we have |f_ij| = |t_ij|α^j-i ≤ ϵ. Thus, the required result follows. _

8.3 Commuting Matrices and Simultaneous Diagonalization

DRAFT

Definition 8.3.1. [Simultaneously Diagonalizable] Let A,B ∈ M_n(ℂ). Then, they are said to be simultaneously diagonalizable if there exists an invertible matrix S such that S^-1AS and S^-1BS are both diagonal matrices.

Proposition 8.3.2. Let A,B ∈ M_n(ℂ). If A and B are simultaneously diagonalizable then AB = BA.

Proof. By definition, there exists an invertible matrix S such that S^-1AS = Λ₁ and S^-1BS = Λ₂. Hence,

Theorem 8.3.3. Let A,B ∈ M_n(ℂ) be diagonalizable matrices. Then they are simultaneously diagonalizable if and only if they commute.

Proof. One part of this theorem has already been proved in Proposition 8.3.2. For the other part, let us assume that AB = BA. Since A is diagonalizable, there exists an invertible matrix S such that

where λ₁,…,λ_k are the distinct eigenvalues of A. We now use the sub-matrix structure of S^-1AS to decompose C = S^-1BS as C = ⌊ ⌋
C11 ⋅⋅⋅ C1k
|| . ||
⌈ .. ⌉
Ck1 ⋅⋅⋅ Ckk

. Since AB = BA and S is invertible, we have ΛC = CΛ. Thus,

Since B is diagonalizable, the matrix C is also diagonalizable and hence the matrices C_ii, for 1 ≤ i]lek, are diagonalizable. So, for 1 ≤ i ≤ k, there exists invertible matrices T_i’s such that T_i^-1C_iiT_i = Λ_i. Put T = T₁ ⊕ ⋅⋅⋅

⊕ T_k. Then,

Definition 8.3.4. [Commuting Family of Matrices]

1.: Let ⊆ M_n(ℂ). Then is said to be a commuting family if each pair of matrices in commutes.
2.: Let B ∈ M_n(ℂ) and W be a subspace of ℂⁿ. Then, W is said to be a B-invariant subspace if Bw ∈ W, for all w ∈ W (or equivalently, BW ⊆ W).
3.: A subspace W of ℂⁿ is said to be -invariant if W is B-invariant for each B ∈.

Example 8.3.5. Let A ∈ M_n(ℂ) with (λ,x) as an eigenpair. Then, W = {cx : c ∈ ℂ} is an A-invariant subspace. Furthermore, if W is an A-invariant subspace with dim(W) = 1 then verify that any non-zero vector in W is an eigenvector of A.

Theorem 8.3.6. [An A-invariant Subspace Contains an Eigenvector of A] Let A ∈ M_n(ℂ) and W ⊆ ℂⁿ be an A-invariant subspace of dimension at least 1. Then W contains an eigenvector of A. PICT PICT DRAFT

Proof. Let

= {f₁,…,f_k}⊆ ℂⁿ be an ordered basis for W. Define T : W → W as Tv = Av. Then T[

] =

is a k × k matrix which satisfies [Tw] = T[

][w], for all w ∈ W. As T[

] ∈ M_k(ℂ), it has an eigenpair, say (λ,

) with

∈ ℂ^k. That is,

Now, put x = ∑ _i=1^k(

)_if_i ∈ ℂⁿ. Then, verify that x ∈ W and [x] =

. Thus, Tx ∈ W and now using Equation (8.3.2), we get

Theorem 8.3.7. Let ⊆ M_n(ℂ) be a commuting family of matrices. Then, all the matrices in have a common eigenvector.

Proof. Note that ℂⁿ is

-invariant. Let W ⊆ ℂⁿ be

-invariant with minimum positive dimension. Let y ∈ W such that y≠0. We claim that y is an eigenvector, for each A ∈

DRAFT

So, on the contrary assume y is not an eigenvector for some A ∈

. Then, by Theorem 8.3.6, W contains an eigenvector x of A for some eigenvalue, say λ. Define W₀ = {z ∈ W : Az = λz}. So W₀ is a proper subspace of W as y ∈ W \ W₀. Also, for z ∈ W₀ and C ∈

, we note that A(Cz) = CAz = λ(Cz), so that Cz ∈ W₀. So W₀ is

-invariant and 1 ≤ dimW₀ < dimW, a contradiction. _

Theorem 8.3.8. Let ⊆ M_n(ℂ) be a family of diagonalizable matrices. Then is commuting if and only if is simultaneously diagonalizable.

Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let us assume the result to be valid for all n < m. Now, let us assume that

⊆ M_m(ℂ) is a family of diagonalizable matrices.

is simultaneously diagonalizable, then by Proposition 8.3.2, the family

is commuting. Conversely, let

be a commuting family. If each A ∈

is a scalar matrix then they are simultaneously diagonalizable via I. So, let A ∈

be a non-scalar matrix. As A is diagonalizable, there exists an invertible matrix S such that

Remark 8.3.9. [σ(AB) and σ(BA)] Let m ≤ n, A ∈ M_m×n(ℂ), and B ∈ M_n×m(ℂ). Then σ(BA) = σ(AB) with n - m extra 0’s. In particular, if A,B ∈ M_n(ℂ) then, P_AB(t) = P_BA(t).

Exercise 8.3.10. [Miscellaneous Exercises]

1.

Let A be nonsingular. Then, verify that A^-1(AB)A = BA. Hence, AB and BA are similar. Thus, P_AB(t) = P_BA(t).

2.

Fix a positive integer k,0 ≤ k ≤ n. Now, define the function f_k : M_n(ℂ) → ℂ by f(A) = coefficient of t^k in P_A(t). Prove that f_k is a continuous function. PICT

DRAFT

3.

For any matrix A, prove that there exists an ϵ > 0 such that A_α = A + αI is invertible, for all α ∈ (0,ϵ). Thus, use the first part to conclude that for any given B, we have P_{A_αB}(t) = P_{BA_α}(t), for all α ∈ (0,ϵ).

4.

Now, use continuity to argue that P_AB(t) = lim_α→0+P_{A_αB}(t) = lim_α→0+P_{BA_α}(t) = P_BA(t).

5.

Let σ(A) = {λ₁,…,λ_n}, σ(B) = {μ₁,…,μ_n} and suppose that AB = BA. Then,

(a): prove that there is a permutation π such that σ(A + B) = {λ₁ + μ_π(1),…,λ_n + μ_π(n)}. In particular, σ(A + B) ⊆ σ(A) + σ(B).
(b): if we further assume that σ(A) ∩ σ(-B) = ∅ then the matrix A + B is nonsingular.

6.

Let A and B be two non-commuting matrices. Then, give an example to show that it is difficult to relate σ(A + B) with σ(A) and σ(B).

7.

Are the matrices A = ⌊ ⌋
0 1 0
| |
⌈0 0 - 1⌉
0 0 0

and B =

simultaneously triangularizable?

8.

Let

⊆ M_n(ℂ) be a family of commuting normal matrices. Then, prove that each element of

is simultaneously unitarily diagonalizable.

9.

Let A ∈ M_n(ℂ) with A^* = A and x^*Ax ≥ 0, for all x ∈ ℂⁿ. Then prove that σ(A) ⊆ ℝ₊ and if tr(A) = 0, then A = 0.

8.3.1 Diagonalization and Real Orthogonal Matrix

Proposition 8.3.11. [Triangularization: Real Matrix] Let A ∈ M_n(ℝ). Then, there exists a real orthogonal matrix Q such that Q^TAQ is block upper triangular, where each diagonal block is of size either 1 or 2. PICT PICT DRAFT

Proof. If all the eigenvalues of A are real then the corresponding eigenvectors have real entries and hence, one can use induction to get the result in this case (see Lemma 6.2.12).

So, now let us assume that A has a complex eigenvalue, say λ = α + iβ with β≠0 and x = u + iv as an eigenvector for λ. Thus, Ax = λx and hence Ax = λx. But, λ≠λ as β≠0. Thus, the eigenvectors x,x are linearly independent and therefore, {u,v} is a linearly independent set. By Gram-Schmidt Orthonormalization process, we get an ordered basis, say {w₁,w₂,…,w_n} of ℝⁿ, where LS(w₁,w₂) = LS(u,v). Also, using the eigen-condition Ax = λx gives

The next result is a direct application of Proposition 8.3.11 and hence the proof is omitted.

PICT PICT DRAFT Corollary 8.3.12. [Simultaneous Triangularization: Real Matrices] Let ⊆ M_n(ℝ) be a commuting family. Then, there exists a real orthogonal matrix Q such that Q^TAQ is a block upper triangular matrix, where each diagonal block is of size either 1 or 2, for all A ∈.

Proposition 8.3.13. Let A ∈ M_n(ℝ). Then the following statements are equivalent.

1.: A is normal.
2.: There exists a real orthogonal matrix Q such that Q^TAQ = ⊕ _iA_i, where A_i’s are real normal matrices of size either 1 or 2.

Proof. 2 ⇒ 1 is trivial. To prove 1 ⇒ 2, recall that Proposition 8.3.11 gives the existence of a real orthogonal matrix Q such that Q^TAQ is upper triangular with diagonal blocks of size either 1 or 2. So, we can write

As B^TB = BB^T, we have ∑ A_1iA_1i^T = A₁₁A₁₁^T. So tr

∑ ₂^kA_1iA_1i^T

= 0. Now, using Exercise 8.3.10.9 again, we have ∑ ₂^kA_1iA_1i^T = 0 and so A_1iA_1i^T = 0, for all i = 2,…,k. Thus, A_1i = 0, for all i = 2,…,k. Hence, the required result follows. _

Exercise 8.3.14. Let A ∈ M_n(ℝ). Then the following are true.

1.: A = -A^T if and only if A is real orthogonally similar to [⊕ _j0] ⊕ [⊕ _iA_i], where A_i = , for some real numbers a_i’s.
2.: AA^T = I if and only if A is real orthogonally similar to [⊕ _iλ_i]⊕[⊕ _jA_j], where λ_i = ±1 and A_j = , for some real numbers θ_i’s.

8.3.2 Convergent and nilpotent matrices

Definition 8.3.15. [Convergent matrices] A matrix A is called a convergent matrix if A^m → 0 as m →∞.

Remark 8.3.16.

1.

Let A be a diagonalizable matrix with ρ(A) < 1. Then, A is a convergent matrix.

Proof. Let A = U^* diag(λ₁,…,λ_n)U. As ρ(A) < 1, for each i,1 ≤ i ≤ n, λ_i^m → 0 as m →∞. Thus, A^m = U^* diag(λ₁^m,…,λ_n^m)U → 0. _ PICT PICT DRAFT

2.

Even if the matrix A is not diagonalizable, the above result holds. That is, whenever ρ(A) < 1, the matrix A is convergent. The converse is also true.

Proof. Let J_k(λ) = λI_k + N_k be a Jordan block of J = Jordan CFA. Then as N_k^k = 0, for each fixed k, we have

Jk(λ)m = λm + C (m, 1)λm -1Nk + ⋅⋅⋅+ C (m, k - 1)λm- k+1Nkk- 1→ 0, as m → ∞.

As λ^m → 0 as m →∞, the matrix J_k(λ)^m → 0 and hence J is convergent. Thus, A is a convergent matrix.

Conversely, if A is convergent, then J must be convergent. Thus each Jordan block J_k(λ) must be convergent. Hence |λ| < 1. _

Theorem 8.3.17. [Decomposition into Diagonalizable and Nilpotent Parts] Let A ∈ M_n(ℂ). Then A = B + C, where B is diagonalizable matrix and C is nilpotent such that BC = CB.

Proof. Let J = Jordan CFA. Then, J = D + N, where D = diag(J) and N is clearly a nilpotent matrix.

Now, note that DN = ND as for each Jordan block J_k(λ) = D_k + N_k, we have D_k = λI and N_k = J_k(0) so that D_kN_k = N_kD_k. As J = Jordan CFA, there exists an invertible matrix S, such that S^-1AS = J. Hence, A = SJS^-1 = SDS^-1 + SNS^-1 = B + C, which satisfy the required conditions. _ PICT