6 Eigenvalues, Eigenvectors and Diagonalizability

In this chapter, every matrix is an element of M_n(ℂ) and x = (x₁,…,x_n)^T ∈ ℂⁿ, for some n ∈ ℕ. We start with a few examples to motivate this chapter.

Example 6.1.1.

1.

Let A =

, B =

and x =

(a): Then A magnifies the nonzero vector three times as A = 3 and behaves by changing the direction of as A = -1. Further, the vectors and are orthogonal.
(b): B magnifies both the vectors and as B = 5 and B = 10. Here again, the vectors and are orthogonal.
(c): x^TAx = 3 -. Here, the displacements occur along perpendicular lines x + y = 0 and x - y = 0, where x + y = (x,y) and x - y = (x,y).
Whereas, x^TBx = 5 + 10. Here also the maximum/minimum displacements occur along the orthogonal lines x + 2y = 0 and 2x - y = 0, where x + 2y = (x,y) and 2x - y = (x,y).
(d): the curve x^TAx = 10 represents a hyperbola, where as the curve x^TBx = 10 represents an ellipse (see Figure 6.1 drawn using the package “Sagemath”).

DRAFT

Figure 6.1: A Hyperbola and two Ellipses (first one has orthogonal axes)

.

2.

Let C =

, a non-symmetric matrix. Then, does there exist a nonzero x ∈ ℂ² which gets magnified by C?
So, we need x≠0 and α ∈ ℂ such that Cx = αx ⇔ [αI₂ -C]x = 0. As x≠0, [αI₂ -C]x = 0 has a solution if and only if det[αI - A] = 0. But,

([ ] ) α- 1 - 2 det[αI - A] = det = α2 - 4α + 1. - 1 α - 3

So, α = 2 ± √3-

. For α = 2 + √3--

, verify that the x≠0 that satisfies [ √-- ]
1 + 3 √ --- 2
- 1 3 - 1

x = 0 equals x = [ √3-- 1]

1

. Similarly, for α = 2 - √ --
3

, the vector x = [ √3-+ 1]

- 1

satisfies

x = 0. In this example,

(a): we still have magnifications in the directions and .
(b): the maximum/minimum displacements do not occur along the lines (-1)x+y = 0 and ( + 1)x - y = 0 (see the third curve in Figure 6.1).
(c): the lines ( - 1)x + y = 0 and ( + 1)x - y = 0 are not orthogonal.

3.

Let A be a real symmetric matrix. Consider the following problem:

DRAFT

To solve this, consider the Lagrangian

( ) ∑n ∑n ∑n L(x,λ ) = xT Ax - λ(xT x- 1) = aijxixj - λ x2i - 1 . i=1j=1 i=1

Partially differentiating L(x,λ) with respect to x_i for 1 ≤ i ≤ n, we get

	= 2a₁₁x₁ + 2a₁₂x₂ + + 2a_1nx_n - 2λx₁,
	=
	= 2a_n1x₁ + 2a_n2x₂ + + 2a_nnx_n - 2λx_n.

Therefore, to get the points of extremum, we solve for

( )T T ∂L-- ∂L-- -∂L- ∂L- 0 = ∂x1 ,∂x2,...,∂xn = ∂x = 2(Ax - λx).

Thus, to solve the extremal problem, we need λ ∈ ℝ, x ∈ ℝⁿ such that x≠0 and Ax = λx.

We observe the following about the matrices A,B and C that appear in Example 6.1.1.

Thus, we see that given A ∈ M_n(ℂ), the number λ ∈ ℂ and x ∈ ℂⁿ,x≠0 satisfying Ax = λx have certain nice properties. For example, there exists a basis of ℂ² in which the matrices A,B and C behave like diagonal matrices. To understand the ideas better, we start with the following definitions.

Definition 6.1.2. [Eigenvalues, Eigenvectors and Eigenspace] Let A ∈ M_n(ℂ). Then,

1.

the equation

Ax = λx ⇔ (A - λIn)x = 0

(6.1.1)

is called the eigen-condition.

2.

an α ∈ ℂ is called a characteristic value/root or eigenvalue or latent root of A if there exists x≠0 satisfying Ax = αx.

3.

an x≠0 satisfying Equation (6.1.1) is called a characteristic vector or eigenvector or invariant/latent vector of A corresponding to λ.

4.

the tuple (α,x) with x≠0 and Ax = αx is called an eigen-pair or characteristic-pair.

5.

for an eigenvalue α ∈ ℂ, Null(A - αI) = {x ∈ ℝⁿ|Ax = αx} is called the eigenspace or characteristic vector space of A corresponding to α.

DRAFT

Theorem 6.1.3. Let A ∈ M_n(ℂ) and α ∈ ℂ. Then, the following statements are equivalent.

1.: α is an eigenvalue of A.
2.: det(A - αI_n) = 0.

Proof. We know that α is an eigenvalue of A if any only if the system (A-αI_n)x = 0 has a non-trivial solution. By Theorem 2.2.40 this holds if and only if det(A - αI) = 0. _

Definition 6.1.4. [Characteristic Polynomial / Equation, Spectrum and Spectral Radius] Let A ∈ M_n(ℂ). Then,

1.: det(A-λI) is a polynomial of degree n in λ and is called the characteristic polynomial of A, denoted P_A(λ), or in short P(λ).
2.: the equation P_A(λ) = 0 is called the characteristic equation of A.
3.: The multi-set (collection with multiplicities) {α ∈ ℂ : P_A(α) = 0} is called the spectrum of A, denoted σ(A). Hence, σ(A) contains all the eigenvalues of A.
4.: The Spectral Radius, denoted ρ(A) of A ∈ M_n(ℂ), equals max{|α| : α ∈ σ(A)}.

Remark 6.1.5. Let A ∈ M_n(ℂ). PICT PICT DRAFT

1.

Then, A is singular if and only if 0 ∈ σ(A).

2.

Further, if α ∈ σ(A) then the following statements hold.

(a): {0} ⊊ Null(A - αI). Therefore, if Rank(A - αI) = r then r < n. Hence, by Theorem 2.2.40, the system (A-αI)x = 0 has n-r linearly independent solutions.
(b): x ∈ Null(A - αI) if and only if cx ∈ Null(A - αI), for c≠0.
(c): If x₁,…,x_r ∈ Null(A - αI) are linearly independent then ∑ _i=1^rc_ix_i ∈ Null(A - αI), for all c_i ∈ ℂ. Hence, if S is a collection of eigenvectors then, we necessarily want the set S to be linearly independent.
(d): Thus, an eigenvector v of A is in some sense a line ℓ = Span({v}) that passes through 0 and v and has the property that the image of ℓ is either ℓ itself or 0.

3.

Since the eigenvalues of A are roots of the characteristic equation, A has exactly n eigenvalues, including multiplicities.

4.

If the entries of A are real and α ∈ σ(A) is also real then the corresponding eigenvector has real entries.

5.

Further, if (α,x) is an eigenpair for A and f(A) = b₀I + b₁A + ⋅⋅⋅

+ b_kA^k is a polynomial in A then (f(α),x) is an eigenpair for f(A).

Almost all books in mathematics differentiate between characteristic value and eigenvalue as the ideas change when one moves from complex numbers to any other scalar field. We give the following example for clarity.

Remark 6.1.6. Let A ∈ M₂(F). Then, A induces a map T ∈(F²) defined by T(x) = Ax, for all x ∈ F². We use this idea to understand the difference.

1.

Let A =

. Then, p_A(λ) = λ² + 1. So, ±i are the roots of P(λ) = 0 in ℂ. Hence, PICT

DRAFT

(a): A has (i,(1,i)^T) and (-i,(i,1)^T) as eigen-pairs or characteristic-pairs.
(b): A has no characteristic value over ℝ.

2.

Let A =

. Then, 2 ± √--
3

are the roots of the characteristic equation. Hence,

(a): A has characteristic values or eigenvalues over ℝ.
(b): A has no characteristic value over ℚ.

Example 6.1.7.

1.: Let A = diag(d₁,…,d_n) with d_i ∈ ℂ,1 ≤ i ≤ n. Then, p(λ) = ∏ _i=1ⁿ(λ - d_i) and thus verify that (d₁,e₁),…,(d_n,e_n) are the eigen-pairs.
2.: Let A = (a_ij) be an n×n triangular matrix. Then, p(λ) = ∏ _i=1ⁿ(λ-a_ii) and thus verify that σ(A) = {a₁₁,a₂₂,…,a_nn}. What can you say about the eigen-vectors of an upper triangular matrix if the diagonal entries are all distinct?
3.: Let A = . Then, p(λ) = (1-λ)². Hence, σ(A) = {1,1}. But the complete solution of the system (A-I₂)x = 0 equals x = ce₁, for c ∈ ℂ. Hence using Remark 6.1.5.2, e₁ is an eigenvector. Therefore, 1 is a repeated eigenvalue whereas there is only one eigenvector.
4.: Let A = . Then, 1 is a repeated eigenvalue of A. In this case, (A-I₂)x = 0 has a solution for every x ∈ ℂ². Hence, any two linearly independent vectors x^T,y^T ∈ ℂ² gives (1,x) and (1,y) as the two eigen-pairs for A. In general, if S = {x₁,…,x_n} is a basis of ℂⁿ then (1,x₁),…,(1,x_n) are eigen-pairs of I_n, the identity matrix. DRAFT
5.: Let A = . Then, and are the eigen-pairs of A.
6.: Let A = . Then, σ(A) = {0,0,0} with e₁ as the only eigenvector.
7.: Let A = . Then, σ(A) = {0,0,0,0,0}. Note that A = 0 implies x₂ = 0 = x₃ = x₅. Thus, e₁ and e₄ are the only eigenvectors. Note that the diagonal blocks of A are nilpotent matrices.

Exercise 6.1.8.

1.

Let A ∈ M_n(ℝ). Then, prove that

(a): if α ∈ σ(A) then α^k ∈ σ(A^k), for all k ∈ ℕ.
(b): if A is invertible and α ∈ σ(A) then α^k ∈ σ(A^k), for all k ∈ ℤ.

2.

Find eigen-pairs over ℂ, for each of the following matrices:
[ ]
1 1 + i
1 - i 1

and

3.

Let A = [a_ij] ∈ M_n(ℂ) with ∑ _j=1ⁿa_ij = a, for all 1 ≤ i ≤ n. Then, prove that a is an eigenvalue of A with corresponding eigenvector 1 = [1,1,…,1]^T.

4.

Prove that the matrices A and A^T have the same set of eigenvalues. Construct a 2 × 2 matrix A such that the eigenvectors of A and A^T are different.

5.

Prove that λ ∈ ℂ is an eigenvalue of A if and only if λ ∈ ℂ is an eigenvalue of A^*. PICT

DRAFT

6.

Let A be an idempotent matrix. Then, prove that its eigenvalues are either 0 or 1 or both.

7.

Let A be a nilpotent matrix. Then, prove that its eigenvalues are all 0.

8.

Let J = 11^T ∈ M_n(ℂ). Then, J is a matrix with each entry 1. Show that

(a): (n,1) is an eigenpair for J.
(b): 0 ∈ σ(J) with multiplicity n-1. Find a set of n-1 linearly independent eigenvectors for 0 ∈ σ(J).

9.

Let B ∈ M_n(ℂ) and C ∈ M_m(ℂ). Now, define the Direct Sum B ⊕C = [ ]
B 0
0 C

. Then, prove that

(a): if (α,x) is an eigen-pair for B then is an eigen-pair for B ⊕ C.
(b): if (β,y) is an eigen-pair for C then is an eigen-pair for B ⊕ C.

Definition 6.1.9. Let A ∈(ℂⁿ). Then, a vector y ∈ ℂⁿ\{0} satisfying y^*A = λy^* is called a left eigenvector of A for λ.

Example 6.1.10.

1.: Let A = . Then, x = is a left eigenvector of A corresponding to the eigenvalue 0 and is a (right) eigenpair of A. DRAFT
2.: Let A = . Then, and are (right) eigen-pairs of A. Also, and are left eigen-pairs of A. Note that x is orthogonal to u and y is orthogonal to v. This is true in general and is proved next.
3.: Let S be a nonsingular matrix such that its columns are left eigenvectors of A. Then, prove that the columns of (S^*)^-1 are right eigenvectors of A.

Theorem 6.1.11. [Principle of bi-orthogonality] Let (λ,x) be a (right) eigenpair and (μ,y) be a left eigenpair of A, where λ≠μ. Then, y is orthogonal to x.

Exercise 6.1.12. Let Ax = λx and x^*A = μx^*. Then μ = λ.

Definition 6.1.13. [Eigenvalues of a linear Operator] Let T ∈(ℂⁿ). Then, α ∈ is called an eigenvalue of T if there exists v ∈ ℂⁿ with v≠0 such that T(v) = αv.

Proposition 6.1.14. Let T ∈ L(ℂⁿ) and let be an ordered basis in ℂⁿ. Then, (α,v) is an eigenpair for T if and only if (α,[v]) is an eigenpair of A = T[,]. PICT PICT DRAFT

Proof. Note that, by definition, T(v) = αv if and only if [Tv] = [αv]. Or equivalently, α ∈ σ(T) if and only if A[v] = α[v]. Thus, the required result follows. _

Remark 6.1.15. [A linear operator on an infinite dimensional space may not have any eigenvalue] Let V be the space of all real sequences (see Example 3.1.4.8a). Now, define a linear operator T ∈(V) by

T (a ,a ,...) = (0,a ,a ,...). 0 1 1 2

We now show that T doesn’t have any eigenvalue.

Solution: Let if possible α be an eigenvalue of T with corresponding eigenvector x = (x₁,x₂,…). Then, the eigen-condition T(x) = αx implies that

(0,x1,x2,...) = α(x1,x2,...) = (αx1,αx2,...).

So, if α≠0 then x₁ = 0 and this in turn implies that x = 0, a contradiction. If α = 0 then (0,x₁,x₂,…) = (0,0,…) and we again get x = 0, a contradiction. Hence, the required result follows.

Theorem 6.1.16. Let λ₁,…,λ_n, not necessarily distinct, be the A = [a_ij] ∈ M_n(ℂ). Then, det(A) = ∏ _i=1ⁿλ_i and tr(A) = ∑ _i=1ⁿa_ii = ∑ _i=1ⁿλ_i. PICT PICT DRAFT

is an identity in x as polynomials. Therefore, by substituting x = 0 in Equation (6.1.2), we get det(A) = (-1)ⁿ(-1)ⁿ ∏ _i=1ⁿλ_i = ∏ _i=1ⁿλ_i. Also,

Exercise 6.1.17.

1.: Let A be a 3 × 3 orthogonal matrix (AA^T = I). If det(A) = 1, then prove that there exists v ∈ ℝ³ \{0} such that Av = v.
2.: Let A ∈ M_2n+1(ℝ) with A^T = -A. Then, prove that 0 is an eigenvalue of A.
3.: Let A ∈ M_n(ℂ). Then, A is invertible if and only if 0 is not an eigenvalue of A.
4.: Let A ∈ M_n(ℂ) satisfy ∥Ax∥≤∥x∥ for all x ∈ ℂⁿ. Then, prove that if α ∈ ℂ with |α| > 1 then A - αI is invertible.

DRAFT

6.1.1 Spectrum of a Matrix

Definition 6.1.18. [Algebraic, Geometric Multiplicity] Let A ∈ M_n(ℂ). Then,

1.: the multiplicity of α ∈ σ(A) is called the algebraic multiplicity of A, denoted Alg.Mul_α(A).
2.: for α ∈ σ(A), dim(Null(A - αI)) is called the geometric multiplicity of A, Geo.Mul_α(A).

Remark 6.1.19. Let A ∈ M_n(ℂ).

1.: Then, for each α ∈ σ(A), using Theorem 2.2.40 dim(Null(A - αI)) ≥ 1. So, we have at least one eigenvector.
2.: If the algebraic multiplicity of α ∈ σ(A) is r ≥ 2 then the Example 6.1.7.7 implies that we need not have r linearly independent eigenvectors.

Theorem 6.1.20. Let A and B be two similar matrices. Then,

1.: α ∈ σ(A) if and only if α ∈ σ(B). DRAFT
2.: for each α ∈ σ(A), Alg.Mul_α(A) = Alg.Mul_α(B) and Geo.Mul_α(A) = Geo.Mul_α(B).

Proof. Since A and B are similar, there exists an invertible matrix S such that A = SBS^-1. So, α ∈ σ(A) if and only if α ∈ σ(B) as

So, let Q₁ = {v₁,…,v_k} be a basis of Null(A - αI). Then, B = SAS^-1 implies that Q₂ = {Sv₁,…,Sv_k}⊆ Null(B -αI). Since Q₁ is linearly independent and S is invertible, we get Q₂ is linearly independent. So, Geo.Mul_α(A) ≤ Geo.Mul_α(B). Now, we can start with eigenvectors of B and use similar arguments to get Geo.Mul_α(B) ≤ Geo.Mul_α(A) and hence the required result follows. _

Remark 6.1.21.

1.

Let A = S^-1BS. Then, from the proof of Theorem 6.1.20, we see that x is an eigenvector of A for λ if and only if Sx is an eigenvector of B for λ.

2.

Let A and B be two similar matrices then σ(A) = σ(B). But, the converse is not true. For example, take A = [ ]
0 0

0 0

and B =

3.

Let A ∈ M_n(ℂ). Then, for any invertible matrix B, the matrices AB and BA = B(AB)B^-1 are similar. Hence, in this case the matrices AB and BA have

(a): the same set of eigenvalues. DRAFT
(b): Alg.Mul_α(AB) = Alg.Mul_α(BA), for each α ∈ σ(A).
(c): Geo.Mul_α(AB) = Geo.Mul_α(BA), for each α ∈ σ(A).

We will now give a relation between the geometric multiplicity and the algebraic multiplicity.

Theorem 6.1.22. Let A ∈ M_n(ℂ). Then, for α ∈ σ(A), Geo.Mul_α(A) ≤ Alg.Mul_α(A).

Proof. Let Geo.Mul_α(A) = k. Suppose Q₁ = {v₁,…,v_k} is an orthonormal basis of Null(A - αI). Extend Q₁ to get {v₁,…,v_k,v_k+1,…,v_n} as an orthonormal basis of ℂⁿ. Put P = [v1,...,vk, vk+1,...,vn]

. Then, P^* = P^-1 and

Remark 6.1.23. Note that in the proof of Theorem 6.1.22, the remaining eigenvalues of A are the eigenvalues of D (see Equation (6.1.6)). This technique is called deflation.

Exercise 6.1.24.

1.

Let A =

. Notice that x₁ = 1√--
3

1 is an eigenvector for A. Find an ordered basis {x₁,x₂,x₃} of ℂ³. Put X = [ ]
x1 x2 x3

. Compute X^-1AX to get a block-triangular matrix. Can you now find the remaining eigenvalues of A?

2.

Let A ∈ M_m×n(ℝ) and B ∈ M_n×m(ℝ).

(a)

If α ∈ σ(AB) and α≠0 then

i.: α ∈ σ(BA).
ii.: Alg.Mul_α(AB) = Alg.Mul_α(BA).
iii.: Geo.Mul_α(AB) = Geo.Mul_α(BA).

(b)

If 0 ∈ σ(AB) and n = m then Alg.Mul₀(AB) = Alg.Mul₀(BA) as there are n eigenvalues, counted with multiplicity.

(c)

Give an example to show that Geo.Mul₀(AB) need not equal Geo.Mul₀(BA) even when n = m.

DRAFT

3.

Let A ∈ M_n(ℝ) be an invertible matrix and let x,y ∈ ℝⁿ with x≠0 and y^TA^-1x≠0. Define B = xy^TA^-1. Then, prove that

(a): λ₀ = y^TA^-1x is an eigenvalue of B of multiplicity 1.
(b): 0 is an eigenvalue of B of multiplicity n - 1 [Hint: Use Exercise 6.1.24.2a].
(c): 1 + αλ₀ is an eigenvalue of I + αB of multiplicity 1, for any α ∈ ℝ.
(d): 1 is an eigenvalue of I + αB of multiplicity n - 1, for any α ∈ ℝ.
(e): det(A + αxy^T) equals (1 + αλ₀)det(A), for any α ∈ ℝ. This result is known as the Shermon-Morrison formula for determinant.

4.

Let A,B ∈ M₂(ℝ) such that det(A) = det(B) and tr(A) = tr(B).

(a): Do A and B have the same set of eigenvalues?
(b): Give examples to show that the matrices A and B need not be similar.

5.

Let A,B ∈ M_n(ℝ). Also, let (λ₁,u) and (λ₂,v) are eigen-pairs of A and B, respectively.

(a): If u = αv for some α ∈ ℝ then (λ₁ + λ₂,u) is an eigen-pair for A + B.
(b): Give an example to show that if u and v are linearly independent then λ₁ + λ₂ need not be an eigenvalue of A + B.

6.

Let A ∈ M_n(ℝ) be an invertible matrix with eigen-pairs (λ₁,u₁),…,(λ_n,u_n). Then, prove that

= [u₁,…,u_n] forms a basis of ℝⁿ. If [b] = (c₁,…,c_n)^T then the system Ax = b has the unique solution

x = c1u1 + c2u2 + ⋅⋅⋅+ -cnun. λ1 λ2 λn

6.2 Diagonalization

Let A ∈ M_n(ℂ) and let T ∈

(ℂⁿ) be defined by T(x) = Ax, for all x ∈ ℂⁿ. In this section, we first find conditions under which one can obtain a basis

of ℂⁿ such that T[

] (see Theorem 4.4.4) is a diagonal matrix. And, then it is shown that normal matrices satisfy the above conditions. To start with, we have the following definition.

Definition 6.2.1. [Matrix Diagonalizability] A matrix A is said to be diagonalizable if A is similar to a diagonal matrix. Or equivalently, P^-1AP = D ⇔ AP = PD, for some diagonal matrix D and invertible matrix P.

Example 6.2.2.

1.: Let A be an n × n diagonalizable matrix. Then, by definition, A is similar to a diagonal matrix, say D = diag(d₁,…,d_n). Thus, by Remark 6.1.21, σ(A) = σ(D) = {d₁,…,d_n}.
2.: Let A = . Then, A cannot be diagonalized.
Solution: Suppose A is diagonalizable. Then, A is similar to D = diag(d₁,d₂). Thus, by Theorem 6.1.20, {d₁,d₂} = σ(D) = σ(A) = {0,0}. Hence, D = 0 and therefore, A = SDS^-1 = 0, a contradiction.
3.: Let A = . Then, A cannot be diagonalized.
Solution: Suppose A is diagonalizable. Then, A is similar to D = diag(d₁,d₂,d₃). Thus, by Theorem 6.1.20, {d₁,d₂,d₃} = σ(D) = σ(A) = {2,2,2}. Hence, D = 2I₃ and therefore, A = SDS^-1 = 2I₃, a contradiction. DRAFT
4.: Let A = . Then, and are two eigen-pairs of A. Define U = . Then, U^*U = I₂ = UU^* and U^*AU = .

Theorem 6.2.3. Let A ∈ M_n(ℝ).

1.: Let S be an invertible matrix such that S^-1AS = diag(d₁,…,d_n). Then, for 1 ≤ i ≤ n, the i-th column of S is an eigenvector of A corresponding to d_i.
2.: Then, A is diagonalizable if and only if A has n linearly independent eigenvectors.

Conversely, let {u₁,…,u_n} be n linearly independent eigenvectors of A corresponding to eigenvalues α₁,…,α_n. Then, by Corollary 3.3.10, S = [u1,...,un ]

is non-singular and

Definition 6.2.4.

1.: A matrix A ∈ M_n(ℂ) is called defective if for some α ∈ σ(A), Geo.Mul_α(A) < Alg.Mul_α(A).
2.: A matrix A ∈ M_n(ℂ) is called non-derogatory if Geo.Mul_α(A) = 1, for each α ∈ σ(A).

Corollary 6.2.5. Let A ∈ M_n(ℂ). Then,

1.: A is non-defective if and only if A is diagonalizable.
2.: A has distinct eigenvalues if and only if A is non-derogatory and non-defective.

Theorem 6.2.6. Let (α₁,v₁),…,(α_k,v_k) be k eigen-pairs of A ∈ M_n(ℂ) with α_i’s distinct. Then, {v₁,…,v_k} is linearly independent. PICT PICT DRAFT

Proof. Suppose {v₁,…,v_k} is linearly dependent. Then, there exists a smallest ℓ ∈{1,…,k - 1} and β≠0 such that v_ℓ+1 = β₁v₁ + ⋅⋅⋅

+ β_ℓv_ℓ. So,

An immediate corollary of Theorem 6.2.3 and Theorem 6.2.6 is stated next without proof.

PICT PICT DRAFT Corollary 6.2.7. Let A ∈ M_n(ℂ) have n distinct eigenvalues. Then, A is diagonalizable.

The converse of Theorem 6.2.6 is not true as I_n has n linearly independent eigenvectors corresponding to the eigenvalue 1, repeated n times.

Corollary 6.2.8. Let α₁,…,α_k be k distinct eigenvalues A ∈ M_n(ℂ). Also, for 1 ≤ i ≤ k, let dim(Null(A - α_iI_n)) = n_i. Then, A has ∑ _i=1^kn_i linearly independent eigenvectors.

Proof. For 1 ≤ i ≤ k, let S_i = {u_i1,…,u_{in_i}} be a basis of Null(A-α_iI_n). Then, we need to prove that ⋃ _i=1^kS_i is linearly independent. To do so, denote p_j(A) = ( k∏ )
(A - αiIn)
i=1

∕

, for 1 ≤ j ≤ k. Then, note that p_j(A) is a polynomial in A of degree k - 1 and

So, to prove that ⋃ _i=1^kS_i is linearly independent, consider the linear system

Corollary 6.2.9. Let A ∈ M_n(ℂ) with distinct eigenvalues α₁,…,α_k. Then, A is diagonalizable if and only if Geo.Mul_{α_i}(A) = Alg.Mul_{α_i}(A), for each 1 ≤ i ≤ k.

Proof. Let Alg.Mul_{α_i}(A) = m_i. Then, ∑ _i=1^km_i = n. Let Geo.Mul_{α_i}(A) = n_i, for 1 ≤ i ≤ k. Then, by Corollary 6.2.8 A has ∑ _i=1^kn_i linearly independent eigenvectors. Also, by Theorem 6.1.22, n_i ≤ m_i, for 1 ≤ i ≤ m_i.

Now, let A be diagonalizable. Then, by Theorem 6.2.3, A has n linearly independent eigenvectors. So, n = ∑ _i=1^kn_i. As n_i ≤ m_i and ∑ _i=1^km_i = n, we get n_i = m_i.

Now, assume that Geo.Mul_{α_i}(A) = Alg.Mul_{α_i}(A), for 1 ≤ i ≤ k. Then, for each i,1 ≤ i ≤ n, A has n_i = m_i linearly independent eigenvectors. Thus, A has ∑ _i=1^kn_i = ∑ _i=1^km_i = n linearly independent eigenvectors. Hence by Theorem 6.2.3, A is diagonalizable. _

Example 6.2.10. Let A = ⌊ ⌋
2 1 1
|⌈ 1 2 1 |⌉

0 - 1 1 . Then, ( ⌊ ⌋)
1
|(1, |⌈ 0 |⌉|)

- 1 and ( ⌊ ⌋)
1
|(2, |⌈ 1 |⌉|)

- 1 are the only eigen-pairs. Hence, by Theorem 6.2.3, A is not diagonalizable.

Exercise 6.2.11. PICT PICT DRAFT

1.

Let A be diagonalizable. Then, prove that A + αI is diagonalizable for every α ∈ ℂ.

2.

Let A be an strictly upper triangular matrix. Then, prove that A is not diagonalizable.

3.

Let A be an n×n matrix with λ ∈ σ(A) with alg.mul_λ(A) = m. If Rank[A-λI]≠n-m then prove that A is not diagonalizable.

4.

If σ(A) = σ(B) and both A and B are diagonalizable then prove that A is similar to B. That is, they are two basis representation of the same linear transformation.

5.

Let A and B be two similar matrices such that A is diagonalizable. Prove that B is diagonalizable.

6.

Let A ∈ M_n(ℝ) and B ∈ M_m(ℝ). Suppose C = [ ]
A 0

0 B

. Then, prove that C is diagonalizable if and only if both A and B are diagonalizable.

7.

Is the matrix A = ⌊ ⌋
2 1 1
|⌈ 1 2 1|⌉

1 1 2

diagonalizable?

8.

Let J_n be an n×n matrix with all entries 1. Then, Geo.Mul₁(J_n) = Alg.Mul₁(J_n) = 1 and Geo.Mul₀(J_n) = Alg.Mul₀(J_n) = n - 1.

9.

Let A = [a_ij] ∈ M_n(ℝ), where a_ij = a, if i = j and b, otherwise. Then, verify that A = (a - b)I_n + bJ_n. Hence, or otherwise determine the eigenvalues and eigenvectors of J_n. Is A diagonalizable?

10.

Let T : ℝ⁵-→ℝ⁵ be a linear operator with Rank(T - I) = 3 and

5 Null (T) = {(x1,x2,x3,x4,x5) ∈ ℝ | x1 + x4 + x5 = 0,x2 + x3 = 0}.

DRAFT

(a): Determine the eigenvalues of T?
(b): For each distinct eigenvalue α of T, determine Geo.Mul_α(T).
(c): Is T diagonalizable? Justify your answer.

11.

Let A ∈ M_n(ℝ) with A≠0 but A² = 0. Prove that A cannot be diagonalized.

12.

Are the following matrices diagonalizable?
i) ⌊ ⌋
1 3 2 1
|| 0 2 3 1||
|| ||
⌈ 0 0 - 1 1⌉
0 0 0 4

, ii)

, iii)

and iv)

13.

Let A ∈ M_n(ℂ).

(a)

Then, prove that Rank(A) = 1 if and only if A = xy^*, for some non-zero vectors x,y ∈ ℂⁿ.

(b)

If Rank(A) = 1 then

i.: A has at most one nonzero eigenvalue of algebraic multiplicity 1.
ii.: find this eigenvalue and its geometric multiplicity.
iii.: when is A diagonalizable?

14.

Let A ∈ M_n(ℂ). If Rank(A) = k then there exists x_i,y_i ∈ ℂⁿ such that A = ∑ _i=1^kx_iy_i^*. Is the converse true?

6.2.1 Schur’s Unitary Triangularization

We now prove one of the most important results in diagonalization, called the Schur’s Lemma or the Schur’s unitary triangularization.

PICT PICT DRAFT Lemma 6.2.12 (Schur’s unitary triangularization (SUT)). Let A ∈ M_n(ℂ). Then, there exists a unitary matrix U such that A is similar to an upper triangular matrix. Further, if A ∈ M_n(ℝ) and σ(A) have real entries then U is a real orthogonal matrix.

Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let n > 1 and assume the result to be true for k < n and prove it for n.

Let (λ₁,x₁) be an eigen-pair of A with ∥x₁∥ = 1. Now, extend it to form an orthonormal basis {x₁,x₂,…,u_n} of ℂⁿ and define X = [x1,x2,...,un ]

. Then, X is a unitary matrix and

Further, if A ∈ M_n(ℝ) and σ(A) has real entries then x₁ ∈ ℝⁿ with Ax₁ = λ₁x₁. Now, one uses induction once again to get the required result. _

Remark 6.2.13. Let A ∈ M_n(ℂ). Then, by Schur’s Lemma there exists a unitary matrix U such that U^*AU = T = [t_ij], a triangular matrix. Thus,

{α ,...,α } = σ (A) = σ(U*AU ) = {t ,...,t }. 1 n 11 nn

(6.2.5)

Furthermore, we can get the α_i’s in the diagonal of T in any prescribed order.

Definition 6.2.14. [Unitary Equivalence] Let A,B ∈ M_n(ℂ). Then, A and B are said to be unitarily equivalent/similar if there exists a unitary matrix U such that A = U^*BU.

Remark 6.2.15. We know that if two matrices are unitarily equivalent then they are necessarily similar as U^* = U^-1, for every unitary matrix U. But, similarity doesn’t imply unitary equivalence (see Exercise 6.2.17.6). In numerical calculations, unitary transformations are preferred as compared to similarity transformations due to the following main reasons:

1.: Exercise 5.4.8.5g implies that ∥Ax∥ = ∥x∥, whenever A is a normal matrix. This need not be true under a similarity change of basis. DRAFT
2.: As U^-1 = U^*, for a unitary matrix, unitary equivalence is computationally simpler.
3.: Also, computation of “conjugate transpose” doesn’t create round-off error in calculation.

Example 6.2.16. Consider the two matrices A = [ ]
3 2
- 1 0 and B = [ ]
1 1
0 2 . Then, we show that they are similar but not unitarily similar.

Solution: Note that σ(A) = σ(B) = {1,2}. As the eigenvalues are distinct, by Theorem 6.2.7, the matrices A and B are diagonalizable and hence there exists invertible matrices S and T such that A = SΛS^-1, B = TΛT^-1, where Λ = [ ]
1 0
0 2 . Thus, A = ST^-1B(ST^-1)^-1. That is, A and B are similar. But, ∑ |a_ij|²≠∑ |b_ij|² and hence by Exercise 5.4.8.11, they cannot be unitarily similar.

Exercise 6.2.17.

1.

If A is unitarily similar to an upper triangular matrix T = [t_ij] then prove that ∑ _i<j|t_ij|² = tr(A^*A) -∑ |λ_i|².

2.

Use the exercises given below to conclude that the upper triangular matrix obtained in the “Schur’s Lemma” need not be unique.

(a)

Prove that B = ⌊ √--⌋
2 - 1 3 2
|⌈0 1 √2--|⌉

0 0 3

and C =

are unitarily equivalent.

(b)

Prove that D = ⌊2 0 3√2-⌋
| √--|
⌈1 1 2 ⌉
0 0 1

and E =

are unitarily equivalent.

(c)

Let A₁ =

and A₂ =

. Then, prove that PICT

DRAFT

i.: A₁ and D are unitarily equivalent.
ii.: A₂ and B are unitarily equivalent.
iii.: Do the above results contradict Exercise 5.4.8.5c? Give reasons for your answer.

3.

Prove that A = ⌊1 1 1⌋
| |
⌈0 2 1⌉
0 0 3

and B =

are unitarily equivalent.

4.

Let A be a normal matrix. If all the eigenvalues of A are 0 then prove that A = 0. What happens if all the eigenvalues of A are 1?

5.

Let A ∈ M_n(ℂ). Then, Prove that if x^*Ax = 0, for all x ∈ ℂⁿ, then A = 0. Do these results hold for arbitrary matrices?

6.

Show that the matrices A = [ ]
4 4
0 4

and B =

are similar. Is it possible to find a unitary matrix U such that A = U^*BU?

Corollary 6.2.18. Let A ∈ M_n(ℂ). If σ(A) = {α₁,…,α_n} then det(A) = ∏ _i=1ⁿα_i and tr(A) = ∑ _i=1ⁿα_i.

Proof. By Schur’s Lemma there exists a unitary matrix U such that U^*AU = T = [t_ij], a triangular matrix. By Remark 6.2.13, σ(A) = σ(T). Hence, det(A) = det(T) = ∏ _i=1ⁿt_ii = ∏ _i=1ⁿα_i and tr(A) = tr(A(UU^*)) = tr(U^*(AU)) = tr(T) = ∑ _i=1ⁿt_ii = ∑ _i=1ⁿα_i. _

6.2.2 Diagonalizability of some Special Matrices

We now use Schur’s unitary triangularization Lemma to state the main theorem of this subsection. Also, recall that A is said to be a normal matrix if AA^* = A^*A.

DRAFT

Theorem 6.2.19 (Spectral Theorem for Normal Matrices). Let A ∈ M_n(ℂ). If A is a normal matrix then there exists a unitary matrix U such that U^*AU = diag(α₁,…,α_n).

Proof. By Schur’s Lemma there exists a unitary matrix U such that U^*AU = T = [t_ij], a triangular matrix. Since A is a normal

Exercise 6.2.20. Let A ∈ M_n(ℂ). If A is either a Hermitian, skew-Hermitian or Unitary matrix then A is a normal matrix.

We re-write Theorem 6.2.19 in another form to indicate that A can be decomposed into linear combination of orthogonal projectors onto eigen-spaces. Thus, it is independent of the choice of eigenvectors.

Remark 6.2.21. Let A ∈ M_n(ℂ) be a normal matrix with eigenvalues α₁,…,α_n.

1.

Then, there exists a unitary matrix U = [u1,...,un]

such that

(a): I_n = u₁u₁^* + + u_nu_n^*.
(b): the columns of U form a set of orthonormal eigenvectors for A (use Theorem 6.2.3). DRAFT
(c): A = A ⋅ I_n = A = α₁u₁u₁^* + + α_nu_nu_n^*.

2.

Let α₁,…,α_k be the distinct eigenvalues of A. Also, let W_i = Null(A-α_iI_n), for 1 ≤ i ≤ k, be the corresponding eigen-spaces.

(a): Then, we can group the u_i’s such that they form an orthonormal basis of W_i, for 1 ≤ i ≤ k. Hence, ℂⁿ = W₁ ⊕⊕ W_k.
(b): If P_{α_i} is the orthogonal projector onto W_i, for 1 ≤ i ≤ k then A = α₁P₁++α_kP_k. Thus, A depends only on eigen-spaces and not on the computed eigenvectors.

Theorem 6.2.22. [Spectral Theorem for Hermitian Matrices] Let A ∈ M_n(ℂ) be a Hermitian matrix. Then,

1.

the eigenvalues α_i, for 1 ≤ i ≤ n, of A are real.

2.

there exists a unitary matrix U, say U = [u1,...,un]

such that

(a): I_n = u₁u₁^* + + u_nu_n^*.
(b): {u₁,…,u_n} forms a set of orthonormal eigenvectors for A.
(c): A = α₁u₁u₁^* + + α_nu_nu_n^*, or equivalently, U^*AU = D, where D = diag(α₁,…,α_n).

Proof. The second part is immediate from Theorem 6.2.19 as Hermitian matrices are also normal matrices. For Part 1, let (α,x) be an eigen-pair. Then, Ax = αx. As A is Hermitian A^* = A. Thus, x^*A = x^*A^* = (Ax)^* = (αx)^* = αx^*. Hence, using x^*A = αx^*, we get PICT

DRAFT

As an immediate corollary of Theorem 6.2.22 and the second part of Lemma 6.2.12, we give the following result without proof.

Corollary 6.2.23. Let A ∈ M_n(ℝ) be symmetric. Then, A = U diag(α₁,…,α_n) U^*, where

1.: the α_i’s are all real,
2.: the columns of U can be chosen to have real entries,
3.: the eigenvectors that correspond to the columns of U form an orthonormal basis of ℝⁿ.

Exercise 6.2.24.

1.

Let A be a skew-symmetric matrix. Then, the eigenvalues of A are either zero or purely imaginary and A is unitarily diagonalizable.

2.

Let A be a skew-Hermitian matrix. Then, A is unitarily diagonalizable.

3.

Characterize all normal matrices in M₂(ℝ).

4.

Let σ(A) = {λ₁,…,λ_n}. Then, prove that the following statements are equivalent. PICT

DRAFT

(a): A is normal.
(b): A is unitarily diagonalizable.
(c): ∑ _i,j|a_ij|² = ∑ _i|λ_i|².
(d): A has n orthonormal eigenvectors.

5.

Let A be a normal matrix with (λ,x) as an eigen-pair. Then,

(a): (A^*)^kx for k ∈ ℤ⁺ is also an eigenvector corresponding to λ.
(b): (λ,x) is an eigen-pair for A^*. [Hint: Verify ∥A^*x -λx∥² = ∥Ax - λx∥².]

6.

Let A be an n × n unitary matrix. Then,

(a): |λ| = 1 for any eigenvalue λ of A.
(b): the eigenvectors x,y corresponding to distinct eigenvalues are orthogonal.

7.

Let A be a 2 × 2 orthogonal matrix. Then, prove the following:

(a): if det(A) = 1 then A = , for some θ,0 ≤ θ < 2π. That is, A counterclockwise rotates every point in ℝ² by an angle θ.
(b): if detA = -1 then A = , for some θ,0 ≤ θ < 2π. That is, A reflects every point in ℝ² about a line passing through origin. Determine this line. Or equivalently, there exists a non-singular matrix P such that P^-1AP = .

8.

Let A be a 3 × 3 orthogonal matrix. Then, prove the following:

(a): if det(A) = 1 then A is a rotation about a fixed axis, in the sense that A has an eigen-pair (1,x) such that the restriction of A to the plane x^⊥ is a two dimensional rotation in x^⊥. DRAFT
(b): if detA = -1 then A corresponds to a reflection through a plane P, followed by a rotation about the line through origin that is orthogonal to P.

9.

Let A be a normal matrix. Then, prove that Rank(A) equals the number of nonzero eigenvalues of A.

10.

[Equivalent characterizations of Hermitian matrices] Let A ∈ M_n(ℂ). Then, the following statements are equivalent.

(a): The matrix A is Hermitian.
(b): The number x^*Ax is real for each x ∈ ℂⁿ.
(c): The matrix A is normal and has real eigenvalues.
(d): The matrix S^*AS is Hermitian for each S ∈ M_n(ℂ).

6.2.3 Cayley Hamilton Theorem

for certain a_i ∈ ℂ, 0 ≤ i ≤ n - 1. Also, if α is an eigenvalue of A then P_A(α) = 0. So, xⁿ - a_n-1x^n-1 + a_n-2x^n-2 + ⋅⋅⋅

+ (-1)^n-1a₁x + (-1)ⁿa₀ = 0 is satisfied by n complex numbers. It turns out that the expression PICT

DRAFT

Lemma 6.2.25. Let A₁,…,A_n ∈ M_n(ℂ) be upper triangular matrices such that the (i,i)-th entry of A_i equals 0, for 1 ≤ i ≤ n. Then, A₁A₂ ⋅⋅⋅ A_n = 0.

Proof. We use induction to prove that the first k columns of A₁A₂ ⋅⋅⋅

A_k is 0, for 1 ≤ k ≤ n. The result is clearly true for k = 1 as the first column of A₁ is 0. For clarity, we show that the first two columns of A₁A₂ is 0. Let B = A₁A₂. Then, by matrix multiplication

Exercise 6.2.26. Let A,B ∈ M_n(ℂ) be upper triangular matrices with the top leading principal submatrix of A of size k being 0. If B[k + 1,k + 1] = 0 then prove that the leading principal submatrix of size k + 1 of AB is 0.

We now prove the Cayley Hamilton Theorem using Schur’s unitary triangularization.

Theorem 6.2.27 (Cayley Hamilton Theorem). Let A ∈ M_n(ℂ). Then, A satisfies its characteristic equation. That is, if P_A(x) = det(A-xI_n) = a₀-xa₁+ ⋅⋅⋅ +(-1)^n-1a_n-1x^n-1+ (-1)ⁿxⁿ then

An - an-1An- 1 + an-2An -2 + ⋅⋅⋅+ (- 1)n-1a1A + (- 1)na0I = 0

holds true as a matrix identity.

Proof. Let σ(A) = {α₁,…,α_n} then P_A(x) = ∏ _i=1ⁿ(x-α_i). And, by Schur’s unitary triangularization there exists a unitary matrix U such that U^*AU = T, an upper triangular matrix with t_ii = α_i, for 1 ≤ i ≤ n. Now, observe that if A_i = T - α_iI then the A_i’s satisfy the conditions of Lemma 6.2.25. Hence,

Remark 6.2.28.

1.

Let A =

. Then, P_A(x) = x² + 2x - 5. Hence, verify that

[ 3 - 4] [1 2 ] [1 0] A2 + 2A - 5I2 = + 2 - 5 = 0. - 2 11 1 - 3 0 1

Further, verify that A^-1 = 1-
5

. Furthermore, A² = -2A + 5I implies that

DRAFT " class="math-display" >

We can keep using the above technique to get A^m as a linear combination of A and I, for all m ≥ 1.

2.

Let A =

. Then, P_A(t) = t(t- 3) - 2 = t² - 3t- 2. So, using P_A(A) = 0, we have A^-1 = A--3I
2

. Further, A² = 3A + 2I implies that A³ = 3A² + 2A = 3(3A + 2I) + 2A = 11A + 6I. So, as above, A^m is a combination of A and I, for all m ≥ 1.

3.

Let A =

. Then, P_A(x) = x². So, even though A≠0, A² = 0.

4.

For A =

, P_A(x) = x³. Thus, by the Cayley Hamilton Theorem A³ = 0. But, it turns out that A² = 0.

5.

For A =

, note that P_A(t) = (t - 1)³. So P_A(A) = 0. But, observe that if q(t) = (t - 1)² then q(A) is also 0.

6.

Let A ∈ M_n(ℂ) with P_A(x) = a₀ - xa₁ +

+ (-1)^n-1a_n-1x^n-1 + (-1)ⁿxⁿ.

(a)

Then, for any ℓ ∈ ℕ, the division algorithm gives α₀,α₁,…,α_n-1 ∈ ℂ and a polynomial f(x) with coefficients from ℂ such that

xℓ = f(x)PA(x) + α0 + x α1 + ⋅⋅⋅+ xn -1αn-1.

Hence, by the Cayley Hamilton Theorem, A^ℓ = α₀I + α₁A + ⋅⋅⋅

+ α_n-1A^n-1.

i.: Thus, to compute any power of A, one needs to apply the division algorithm to get α_i’s and know Aⁱ, for 1 ≤ i ≤ n - 1. This is quite helpful in numerical computation as computing powers takes much more time than division. DRAFT
ii.: Note that LS is a subspace of M_n(ℂ). Also, dim = n². But, the above argument implies that dim≤ n.
iii.: In the language of graph theory, it says the following: “Let G be a graph on n vertices and A its adjacency matrix. Suppose there is no path of length n - 1 or less from a vertex v to a vertex u in G. Then, G doesn’t have a path from v to u of any length. That is, the graph G is disconnected and v and u are in different components of G.”

(b)

Suppose A is non-singular. Then, by definition a₀ = det(A)≠0. Hence,

- 1 1 [ n-2 n-2 n-1 n-1] A = a--a1I - a2A + ⋅⋅⋅+ (- 1) an- 1A + (- 1) A . 0

This matrix identity can be used to calculate the inverse.

(c)

The above also implies that if A is invertible then A^-1 ∈ LS { 2 }
I,A, A ,...

. That is, A^-1 is a linear combination of the vectors I,A,…,A^n-1.

Exercise 6.2.29. Find the inverse of ⌊ ⌋
2 3 4
|5 6 7|
⌈ ⌉
1 1 2 , ⌊ ⌋
- 1 - 1 1
| 1 - 1 1|
⌈ ⌉
0 1 1 and ⌊ ⌋
1 - 2 - 1
|- 2 1 - 1|
⌈ ⌉
0 - 1 2 by the Cayley Hamilton Theorem.

Exercise 6.2.30. Miscellaneous Exercises: PICT PICT DRAFT

1.

Let A,B ∈ M₂(ℂ) such that A = AB - BA. Then, prove that A² = 0.

2.

Let B be an m×n matrix and A = [ ]
0 B
BT 0

. Then, prove that ( [ ])
x
λ, y

is an eigen-pair if and only if ( [ ])
x
- λ, - y

is an eigen-pair.

3.

Let B,C ∈ M_n(ℝ). Define A = [ ]
B C

- C B

. Then, prove the following:

(a): if s is a real eigenvalue of A with corresponding eigenvector then s is also an eigenvalue corresponding to the eigenvector .
(b): if s+it is a complex eigenvalue of A with corresponding eigenvector then s - it is also an eigenvalue of A with corresponding eigenvector .
(c): is an eigen-pair of B + iC if and only if is an eigen-pair of B - iC.
(d): is an eigen-pair of A if and only if is an eigen-pair of B + iC.
(e): det(A) = |det(B + iC)|².

The next section deals with quadratic forms which helps us in better understanding of conic sections in analytic geometry.

6.3 Quadratic Forms

Definition 6.3.1. [Positive, Semi-positive and Negative definite matrices] Let A ∈ M_n(ℂ). Then, A is said to be PICT PICT DRAFT

1.: positive semi-definite (psd) if x^*Ax ∈ ℝ and x^*Ax ≥ 0, for all x ∈ ℂⁿ.
2.: positive definite (pd) if x^*Ax ∈ ℝ and x^*Ax > 0, for all x ∈ ℂⁿ \{0}.
3.: negative semi-definite (nsd) if x^*Ax ∈ ℝ and x^*Ax ≤ 0, for all x ∈ ℂⁿ.
4.: negative definite (nd) if x^*Ax ∈ ℝ and x^*Ax < 0, for all x ∈ ℂⁿ \{0}.
5.: indefinite if x^*Ax ∈ ℝ and there exist x,y ∈ ℂⁿ such that x^*Ax < 0 < y^*Ay.

Lemma 6.3.2. Let A ∈ M_n(ℂ). Then A is Hermitian if and only if at least one of the following statements hold:

1.: S^*AS is Hermitian for all S ∈ M_n.
2.: A is normal and has real eigenvalues.
3.: x^*Ax ∈ ℝ for all x ∈ ℂⁿ.

Suppose A = A^*. Then, A is clearly normal as AA^* = A² = A^*A. Further, if (λ,x) is an eigenpair then λx^*x = x^*Ax ∈ ℝ implies λ ∈ ℝ.

For the last part, note that x^*Ax ∈ ℂ. Thus x^*Ax = (x^*Ax)^* = x^*A^*x = x^*Ax, we get Im(x^*Ax) = 0. Thus, x^*Ax ∈ ℝ.

If S^*AS is Hermitian for all S ∈ M_n then taking S = I_n gives A is Hermitian.

If A is normal then A = U^* diag(λ₁,…,λ_n)U for some unitary matrix U. Since λ_i ∈ ℝ, A^* = (U^* diag(λ₁,…,λ_n)U)^* = U^* diag(λ₁,…,λ_n)U = U^* diag(λ₁,…,λ_n)U = A. So, A is Hermitian.

If x^*Ax ∈ ℝ for all x ∈ ℂⁿ then a_ii = e_i^*Ae_i ∈ ℝ. Also, a_ii + a_jj + a_ij + a_ji = (e_i + e_j)^*A(e_i + e_j) ∈ ℝ. So, Im(a_ij) = -Im(a_ji). Similarly, a_ii + a_jj + ia_ij - ia_ji = (e_i + ie_j)^*A(e_i + ie_j) ∈ ℝ implies that Re(a_ij) = Re(a_ji). Thus, A = A^*. _

PICT PICT DRAFT Remark 6.3.3. Let A ∈ M_n(ℝ). Then the condition x^*Ax ∈ ℝ in Definition 6.3.9 is always true and hence doesn’t put any restriction on the matrix A. So, in Definition 6.3.9, we assume that A^T = A, i.e., A is a symmetric matrix.

Example 6.3.4.

1.: Let A = or A = . Then, A is positive definite.
2.: Let A = or A = . Then, A is positive semi-definite but not positive definite.
3.: Let A = or A = . Then, A is negative definite.
4.: Let A = or A = . Then, A is negative semi-definite.
5.: Let A = or A = . Then, A is indefinite.

Theorem 6.3.5. Let A ∈ M_n(ℂ). Then, the following statements are equivalent.

1.: A is positive semi-definite.
2.: A^* = A and each eigenvalue of A is non-negative.
3.: A = B^*B for some B ∈ M_n(ℂ).

Proof. 1 ⇒2: Let A be positive semi-definite. Then, by Lemma 6.3.2 A is Hermitian. If (α,v) is an eigen-pair of A then α∥v∥² = v^*Av ≥ 0. So, α ≥ 0. PICT

DRAFT

2 ⇒3: Let σ(A) = {α₁,…,α_n}. Then, by spectral theorem, there exists a unitary matrix U such that U^*AU = D with D = diag(α₁,…,α_n). As α_i ≥ 0, for 1 ≤ i ≤ n, define D = diag( √ α1-

,…,

). Then, A = UD[DU^*] = B^*B.

3 ⇒1: Let A = B^*B. Then, for x ∈ ℂⁿ, x^*Ax = x^*B^*Bx = ∥Bx∥² ≥ 0. Thus, the required result follows. _

Theorem 6.3.6. Let A ∈ M_n(ℂ). Then, the following statements are equivalent.

1.: A is positive definite.
2.: A^* = A and each eigenvalue of A is positive.
3.: A = B^*B for a non-singular matrix B ∈ M_n(ℂ).

Remark 6.3.7. Let A ∈ M_n(ℂ) be a Hermitian matrix with eigenvalues λ₁ ≥ λ₂ ≥ ⋅⋅⋅ ≥ λ_n. Then, there exists a unitary matrix U = [u₁,u₂,…,u_n] and a diagonal matrix D = diag(λ₁,λ₂,…,λ_n) such that A = UDU^*. Now, for 1 ≤ i ≤ n, define α_i = max{λ_i,0} and β_i = min{λ_i,0}. Then

1.: for D₁ = diag(α₁,α₂,…,α_n), the matrix A₁ = UD₁U^* is positive semi-definite.
2.: for D₂ = diag(β₁,β₂,…,β_n), the matrix A₂ = UD₂U^* is positive semi-definite.
3.: A = A₁ - A₂. The matrix A₁ is generally called the positive semi-definite part of A.

Definition 6.3.8. [Multilinear Function] Let V be a vector space over F. Then, PICT PICT DRAFT

1.: for a fixed m ∈ ℕ, a function f : V^m → F is called an m-multilinear function if f is linear in each component. That is,
$f (v1,...,vi-1,(vi + αu ),vi+1 ...,vm) = f(v1,...,vi- 1,vi,vi+1 ...,vm ) + αf(v ,...,v ,u,v ...,v ) 1 i-1 i+1 m$
for α ∈ F, u ∈ V and v_i ∈ V, for 1 ≤ i ≤ m.
2.: An m-multilinear form is also called an m-form.
3.: A 2-form is called a bilinear form.

Definition 6.3.9. [Sesquilinear, Hermitian and Quadratic Forms] Let A = [a_ij] ∈ M_n(ℂ) be a Hermitian matrix and let x,y ∈ ℂⁿ. Then, a sesquilinear form in x,y ∈ ℂⁿ is defined as H(x,y) = y^*Ax. In particular, H(x,x), denoted H(x), is called a Hermitian form. In case A ∈ M_n(ℝ), H(x) is called a quadratic form.

Remark 6.3.10. Observe that

1.: if A = I_n then the bilinear/sesquilinear form reduces to the standard inner product.
2.: H(x,y) is ‘linear’ in the first component and ‘conjugate linear’ in the second component. DRAFT
3.: the quadratic form H(x) is a real number. Hence, for α ∈ ℝ, the equation H(x) = α, represents a conic in ℝⁿ.

Example 6.3.11.

1.: Let v_i ∈ ℂⁿ, for 1 ≤ i ≤ n. Then, f = det is an n-form on ℂⁿ.
2.: Let A ∈ M_n(ℝ). Then, f(x,y) = y^TAx, for x,y ∈ ℝⁿ, is a bilinear form on ℝⁿ.
3.: Let A = . Then, A^* = A and for x = , verify that $* 2 2 -- H (x) = x Ax = |x |+ 2|y| + 2Re ((2- i)xy)$ where ‘Re’ denotes the real part of a complex number, is a sesquilinear form.

6.3.1 Sylvester’s law of inertia

The main idea of this section is to express H(x) as sum or difference of squares. Since H(x) is a quadratic in x, replacing x by cx, for c ∈ ℂ, just gives a multiplication factor by |c|². Hence, one needs to study only the normalized vectors. Let us consider Example 6.1.1 again. There we see that

In general, let A ∈ M_n(ℂ) be a Hermitian matrix. Then, by Theorem 6.2.22, σ(A) = {α₁,…,α_n}⊆ ℝ and there exists a unitary matrix U such that U^*AU = D = diag(α₁,…,α_n). Let x = Uz. Then, ∥x∥ = 1 and U is unitary implies that ∥z∥ = 1. If z = (z₁,…,z_n)^* then

where α₁,…,α_p > 0, α_p+1,…,α_r < 0 and α_r+1,…,α_n = 0. Thus, we see that the possible values of H(x) seem to depend only on the eigenvalues of A. Since U is an invertible matrix, the components z_i’s of z = U^-1x = U^*x are commonly known as the linearly independent linear forms. Note that each z_i is a linear expression in the components of x. Also, note that in Equation (6.3.3), p corresponds to the number of positive eigenvalues and r - p to the number of negative eigenvalues. For a better understanding, we define the following numbers.

Definition 6.3.12. [Inertia and Signature of a Matrix] Let A ∈ M_n(ℂ) be a Hermitian matrix. The inertia of A, denoted i(A), is the triplet (i₊(A),i_-(A),i₀(A)), where i₊(A) is the number of positive eigenvalues of A, i_-(A) is the number of negative eigenvalues of A and i₀(A) is the nullity of A. The difference i₊(A) - i_-(A) is called the signature of A.

Exercise 6.3.13. Let A ∈ M_n(ℂ) be a Hermitian matrix. If the signature and the rank of A is known then prove that one can find out the inertia of A.

As a next result, we show that in any expression of H(x) as a sum or difference of n absolute squares of linearly independent linear forms, the number p (respectively, r - p) gives the number of positive (respectively, negative) eigenvalues of A. This is popularly known as the ‘Sylvester’s law of inertia’.

Lemma 6.3.14. [Sylvester’s Law of Inertia] Let A ∈ M_n(ℂ) be a Hermitian matrix and let x ∈ ℂⁿ. Then, every Hermitian form H(x) = x^*Ax, in n variables can be written as

2 2 2 2 H (x) = |y1| + ⋅⋅⋅+ |yp| - |yp+1| - ⋅⋅⋅- |yr |

where y₁,…,y_r are linearly independent linear forms in the components of x and the integers p and r satisfying 0 ≤ p ≤ r ≤ n, depend only on A.

Proof. Equation (6.3.3) implies that H(x) has the required form. We only need to show that p and r are uniquely determined by A. Hence, let us assume on the contrary that there exist p,q,r,s ∈ ℕ with p > q such that

DRAFT

2 2 2 2 H(x ) = |y1 |+ ⋅⋅⋅+ |yp| - |yp+1| - ⋅⋅⋅- |yr| (6.3.4) = |z |2 + ⋅⋅⋅+ |z |2 - |z |2 - ⋅⋅⋅- |z |2, (6.3.5) 1 q q+1 s

where y =

= Mx, z =

= Nx with Y ₁ = ⌊ ⌋
|y1|
| ...|
⌈ ⌉
yp

and Z₁ =

for some invertible matrices M and N. Now the invertibility of M and N implies z = By, for some invertible matrix B. Decompose B = [ ]
B1 B2

B3 B4

, where B₁ is a q ×p matrix. Then [ ]
Z1

Z2

. As p > q, the homogeneous linear system B₁Y ₁ = 0 has a nontrivial solution, say ^
Y1

and consider

. Then for this choice of

, Z₁ = 0 and thus, using Equations (6.3.4) and (6.3.5), we have

H (˜y) = |˜y1|2 + |y˜2|2 + ⋅⋅⋅+ |˜yp|2 - 0 = 0 - (|zq+1|2 + ⋅⋅⋅+ |zs|2).

Now, this can hold only if ^Y1 = 0, a contradiction to ^Y1 being a non-trivial solution. Hence p = q. Similarly, the case r > s can be resolved. This completes the proof of the lemma. __

Remark 6.3.15. Since A is Hermitian, Rank(A) equals the number of nonzero eigenvalues. Hence, Rank(A) = r. The number r is called the rank and the number r - 2p is called the inertial degree of the Hermitian form H(x). PICT PICT DRAFT

We now look at another form of the Sylvester’s law of inertia. We start with the following definition.

Definition 6.3.16. [Star Congruence] Let A,B ∈ M_n(ℂ). Then, A is said to be *-congruent (read star-congruent) to B if there exists an invertible matrix S such that A = S^*BS.

Theorem 6.3.17. [Second Version: Sylvester’s Law of Inertia] Let A,B ∈ M_n(ℂ) be Hermitian. Then, A is *-congruent to B if and only if i(A) = i(B).

Proof. By spectral theorem U^*AU = Λ_A and V ^*BV = Λ_B, for some unitary matrices U,V and diagonal matrices Λ_A,Λ_B of the form diag(+, ⋅⋅⋅

,+,-,

,-,0,

,0). Thus, there exist invertible matrices S,T such that S^*AS = D_A and T^*BT = D_B, where D_A,D_B are diagonal matrices of the form diag(1, ⋅⋅⋅

,1,-1,

,-1,0,

,0).

If i(A) = i(B), then it follows that D_A = D_B, i.e., S^*AS = T^*BT and hence A = (TS^-1)^*B(TS^-1).

Conversely, suppose that A = P^*BP, for some invertible matrix P, and i(B) = (k,l,m). As T^*BT = D_B, we have, A = P^*(T^*)^-1D_BT^-1P = (T^-1P)^*D_B(T^-1P). Now, let X = (T^-1P)^-1. Then, A = (X^-1)^*D_BX^-1 and we have the following observations.

6.3.2 Applications in Eculidean Plane and Space

We now obtain conditions on the eigenvalues of A, corresponding to the associated quadratic form, to characterize conic sections in ℝ², with respect to the standard inner product.

Definition 6.3.18. [Associated Quadratic Form] Let f(x,y) = ax²+2hxy+by²+2fx+2gy+c be a general quadratic in x and y, with coefficients from ℝ. Then,

[ ] [ ] T [ ] a h x 2 2 H (x) = x Ax = x, y h b y = ax + 2hxy + by

is called the associated quadratic form of the conic f(x,y) = 0.

Proposition 6.3.19. Consider the general quadratic f(x,y), for a,b,c,g,f,h ∈ ℝ. Then, f(x,y) = 0 represents

1.: an ellipse or a circle if ab - h² > 0, DRAFT
2.: a parabola or a pair of parallel lines if ab - h² = 0,
3.: a hyperbola or a pair of intersecting lines if ab - h² < 0.

Proof. As A is symmetric, by Corollary 6.2.23, A = U diag(α₁,α₂)U^T, where U = [u1,u2]

is an orthogonal matrix, with (α₁,u₁) and (α₂,u₂) as eigen-pairs of A. Let [u,v]

= x^TU. As u₁ and u₂ are orthogonal, u and v represent orthogonal lines passing through origin in the (x,y)-plane. In most cases, these lines form the principal axes of the conic.

for some g₁,f₁ ∈ ℝ. Now, we consider different cases depending of the values of α₁,α₂:

Thus, we have considered all the possible cases and the required result follows. _

DRAFT

Remark 6.3.20. Observe that the condition [ ]
x
y = u₁u₂ [ ]
u
v implies that the principal axes of the conic are functions of the eigenvectors u₁ and u₂.

Exercise 6.3.21. Sketch the graph of the following surfaces:

1.: x² + 2xy + y² - 6x - 10y = 3.
2.: 2x² + 6xy + 3y² - 12x - 6y = 5.
3.: 4x² - 4xy + 2y² + 12x - 8y = 10.
4.: 2x² - 6xy + 5y² - 10x + 4y = 7.

As a last application, we consider a quadratic in 3 variables, namely x,y and z. To do so, let A = ⌊ ⌋
a d e
|⌈d b f |⌉

e f c

, x =

, b =

and y =

with

Example 6.3.22. Determine the following quadrics f(x,y,z) = 0, where

1.: f(x,y,z) = 2x² + 2y² + 2z² + 2xy + 2xz + 2yz + 4x + 2y + 4z + 2.
2.: f(x,y,z) = 3x² - y² + z² + 10.
3.: f(x,y,z) = 3x² - y² + z² - 10.
4.: f(x,y,z) = 3x² - y² + z - 10.

DRAFT

Solution: Part 1 Here, A = ⌊ ⌋
2 1 1
|⌈1 2 1|⌉

1 1 2 , b = ⌊ ⌋
2
|⌈1|⌉

2 and q = 2. So, the orthogonal matrices P = ⌊ ⌋
1√-- √1- 1√--
|| 1√3- -√21- 1√6||
⌈ 3 2 6⌉
1√-- 0 -√2-
3 6 and P^TAP = ⌊ ⌋
4 0 0
| |
⌈0 1 0⌉
0 0 1 . Hence, f(x,y,z) = 0 reduces to

5 1 1 9 4(y1 + -√--)2 + (y2 + √-)2 + (y3 - √-)2 = --. 4 3 2 6 12

So, the standard form of the quadric is 4z₁² + z₂² + z₃² = 912-

, where

= P

is the center and x + y + z = 0,x - y = 0 and x + y - 2z = 0 as the principal axes.

Part 2 Here f(x,y,z) = 0 reduces to 2
y10- - 2
3x10- - 2
z10 = 1 which is the equation of a hyperboloid consisting of two sheets with center 0 and the axes x, y and z as the principal axes.

Part 3 Here f(x,y,z) = 0 reduces to 3x2-
10 - y2
10 + z2
10 = 1 which is the equation of a hyperboloid consisting of one sheet with center 0 and the axes x, y and z as the principal axes.