Chapter 6
Eigenvalues, Eigenvectors and Diagonalizability

6.1 Introduction and Definitions

In this chapter, every matrix is an element of Mn() and x = (x1,,xn)T n, for some n . We start with a few examples to motivate this chapter.

Example 6.1.1.

1.
Let A = [    ]
 1  2
 2  1, B = [        ]
   9  - 2
 - 2    6 and x = [ ]
 x
 y.
(a)
Then A magnifies the nonzero vector [  ]
  1

  1 three times as A[ ]
 1

 1 = 3[  ]
  1

  1 and behaves by changing the direction of [  ]
  1

 - 1 as A[   ]
   1

 - 1 = -1[   ]
  1

 - 1. Further, the vectors [ ]
 1

 1 and [   ]
   1
  - 1 are orthogonal.
(b)
B magnifies both the vectors [ ]
 1

 2 and [   ]
 - 2

   1 as B[ ]
 1

 2 = 5[ ]
 1

 2 and B[   ]
   2

 - 1 = 10[   ]
   2
  - 1. Here again, the vectors [ ]
 1
 2 and [  ]
  2
 - 1 are orthogonal.
(c)
xT Ax = 3(x+--y)2-
   2 -(x---y)2-
   2. Here, the displacements occur along perpendicular lines x + y = 0 and x - y = 0, where x + y = (x,y)[ ]
 1

 1 and x - y = (x,y)[   ]
  1

 - 1.

Whereas, xT Bx = 5(x-+-2y)2
    5 + 10(2x---y)2-
    5. Here also the maximum/minimum displacements occur along the orthogonal lines x + 2y = 0 and 2x - y = 0, where x + 2y = (x,y)[ ]
 1

 2 and 2x - y = (x,y)[   ]
  2

 - 1.

(d)
the curve xT Ax = 10 represents a hyperbola, where as the curve xT Bx = 10 represents an ellipse (see Figure 6.1 drawn using the package “Sagemath”).

PIC PIC PIC PICT PICT DRAFT

Figure 6.1: A Hyperbola and two Ellipses (first one has orthogonal axes)

.


2.
Let C = [    ]
 1  2
 1  3, a non-symmetric matrix. Then, does there exist a nonzero x 2 which gets magnified by C?
So, we need x0 and α such that Cx = αx [αI2 -C]x = 0. As x0, [αI2 -C]x = 0 has a solution if and only if det[αI - A] = 0. But,
                 ([            ] )
                    α- 1     - 2
det[αI - A] = det                  = α2 - 4α + 1.
                      - 1 α - 3
So, α = 2 ±√3-. For α = 2 + √3--, verify that the x0 that satisfies [    √--        ]
 1 +  3  √ --- 2
     - 1   3 - 1x = 0 equals x = [ √3-- 1]

       1. Similarly, for α = 2 -√ --
  3, the vector x = [ √3-+ 1]

     - 1 satisfies [                ]
 1 - √3-       - 2
           √ --
     - 1 -   3- 1x = 0. In this example,
(a)
we still have magnifications in the directions [√ --  ]
   3- 1
      1 and [√ --  ]
   3+ 1
     - 1.
(b)
the maximum/minimum displacements do not occur along the lines (√ --
  3-1)x+y = 0 and (√ --
  3 + 1)x - y = 0 (see the third curve in Figure 6.1).
(c)
the lines (√3-- - 1)x + y = 0 and (√3- + 1)x - y = 0 are not orthogonal.
3.
Let A be a real symmetric matrix. Consider the following problem:
PICT PICT DRAFT                      T                   n      T
Maximize  (Minimize)x  Ax such that x ∈ ℝ and x  x = 1.
To solve this, consider the Lagrangian
                                               (          )
                              ∑n ∑n              ∑n
L(x,λ ) = xT Ax - λ(xT x- 1) =       aijxixj - λ     x2i - 1  .
                              i=1j=1             i=1
Partially differentiating L(x) with respect to xi for 1 i n, we get
∂L--
∂x1 = 2a11x1 + 2a12x2 + ⋅⋅⋅ + 2a1nxn - 2λx1,
..
. = ..
.
∂L--
∂xn = 2an1x1 + 2an2x2 + ⋅⋅⋅ + 2annxn - 2λxn.

Therefore, to get the points of extremum, we solve for

     (                  )T
 T     ∂L-- ∂L--    -∂L-      ∂L-
0  =   ∂x1 ,∂x2,...,∂xn    =  ∂x = 2(Ax -  λx).
Thus, to solve the extremal problem, we need λ , x n such that x0 and Ax = λx.

We observe the following about the matrices A,B and C that appear in Example 6.1.1.

1.
det(A) = -3 = 3 ×-1, det(B) = 50 = 5 × 10 and det(C) = 1 = (2 + √--
 3) × (2 -√ --
  3).
2.
tr(A) = 2 = 3 - 1, tr(B) = 15 = 5 + 10 and det(C) = 4 = (2 + √ --
  3) + (2 -√ --
  3).
3.
The sets { [ ] [   ]}
   1 ,  1
   1   - 1, {[  ] [   ]}
   1 ,  2
   2   - 1 and {[ √--   ] [√ --   ]}
    3 - 1 ,   3 + 1
     1        - 1 are linearly independent.
4.
If v1 = [ ]
 1
 1 and v2 = [   ]
  1
  - 1 and S = [v1,v2 ] then
(a)
AS = [Av1,Av2 ] = [3v1, - v2] = S[      ]
  3  0

  0  - 1S-1AS = [      ]
  3  0

  0  - 1 = diag(3,-1).
(b)
Let u1 =  1
√---
  2v1 and u2 =  1
√---
  2v2. Then, u1 and u2 are orthonormal unit vectors, i.e., if U = [u  ,u  ]
  1  2 then I = UU* = u1u1* + u2u2* and A = 3u1u1*- u2u2*.
5.
If v1 = [ ]
 1
 2 and v2 = [   ]
  2
  - 1 and S = [v1,v2 ] then
(a)
AS = [Av1,Av2 ] = [5v1, 10v2] = S[     ]
 5   0
 0  10S-1AS = [      ]
 5   0
 0  10 = diag(3,-1).
(b)
Let u1 = -1--
√ 5v1 and u2 = -1--
√ 5v2. Then, u1 and u2 are orthonormal unit vectors, i.e., if U = [u1,u2 ] then I = UU* = u1u1* + u2u2* and A = 5u1u1* + 10u2u2*.
6.
If v1 = [      ]
 √3-- 1

    1 and v2 = [      ]
 √3-+ 1

   - 1 and S = [v1, v2] then
PICT PICT DRAFT           [               ]
           2+  √3-    0                √--    √ --
S -1CS  =              √ --=  diag(2+   3,2 -   3).
              0    2 -   3

Thus, we see that given A Mn(), the number λ and x n,x0 satisfying Ax = λx have certain nice properties. For example, there exists a basis of 2 in which the matrices A,B and C behave like diagonal matrices. To understand the ideas better, we start with the following definitions.

Definition 6.1.2. [Eigenvalues, Eigenvectors and Eigenspace] Let A Mn(). Then,

1.
the equation
Ax  = λx ⇔  (A - λIn)x = 0
(6.1.1)

is called the eigen-condition.

2.
an α is called a characteristic value/root or eigenvalue or latent root of A if there exists x0 satisfying Ax = αx.
3.
an x0 satisfying Equation (6.1.1) is called a characteristic vector or eigenvector or invariant/latent vector of A corresponding to λ.
4.
the tuple (α,x) with x0 and Ax = αx is called an eigen-pair or characteristic-pair.
5.
for an eigenvalue α , Null(A - αI) = {x n|Ax = αx} is called the eigenspace or characteristic vector space of A corresponding to α.
PICT PICT DRAFT

Theorem 6.1.3. Let A Mn() and α . Then, the following statements are equivalent.

1.
α is an eigenvalue of A.
2.
det(A - αIn) = 0.

Proof. We know that α is an eigenvalue of A if any only if the system (A-αIn)x = 0 has a non-trivial solution. By Theorem 2.2.40 this holds if and only if det(A - αI) = 0. _

Definition 6.1.4. [Characteristic Polynomial / Equation, Spectrum and Spectral Radius] Let A Mn(). Then,

1.
det(A-λI) is a polynomial of degree n in λ and is called the characteristic polynomial of A, denoted PA(λ), or in short P(λ).
2.
the equation PA(λ) = 0 is called the characteristic equation of A.
3.
The multi-set (collection with multiplicities) {α : PA(α) = 0} is called the spectrum of A, denoted σ(A). Hence, σ(A) contains all the eigenvalues of A.
4.
The Spectral Radius, denoted ρ(A) of A Mn(), equals max{|α| : α σ(A)}.

We thus observe the following.

Remark 6.1.5. Let A Mn(). PICT PICT DRAFT

1.
Then, A is singular if and only if 0 σ(A).
2.
Further, if α σ(A) then the following statements hold.
(a)
{0} Null(A - αI). Therefore, if Rank(A - αI) = r then r < n. Hence, by Theorem 2.2.40, the system (A-αI)x = 0 has n-r linearly independent solutions.
(b)
x Null(A - αI) if and only if cx Null(A - αI), for c0.
(c)
If x1,,xr Null(A - αI) are linearly independent then i=1rcixi Null(A - αI), for all ci . Hence, if S is a collection of eigenvectors then, we necessarily want the set S to be linearly independent.
(d)
Thus, an eigenvector v of A is in some sense a line = Span({v}) that passes through 0 and v and has the property that the image of is either itself or 0.
3.
Since the eigenvalues of A are roots of the characteristic equation, A has exactly n eigenvalues, including multiplicities.
4.
If the entries of A are real and α σ(A) is also real then the corresponding eigenvector has real entries.
5.
Further, if (α,x) is an eigenpair for A and f(A) = b0I + b1A + ⋅⋅⋅ + bkAk is a polynomial in A then (f(α),x) is an eigenpair for f(A).

Almost all books in mathematics differentiate between characteristic value and eigenvalue as the ideas change when one moves from complex numbers to any other scalar field. We give the following example for clarity.

Remark 6.1.6. Let A M2(F). Then, A induces a map T L(F2) defined by T(x) = Ax, for all x F2. We use this idea to understand the difference.

1.
Let A = [      ]
  0   1
 - 1  0. Then, pA(λ) = λ2 + 1. So, ±i are the roots of P(λ) = 0 in . Hence, PICT PICT DRAFT
(a)
A has (i,(1,i)T ) and (-i,(i,1)T ) as eigen-pairs or characteristic-pairs.
(b)
A has no characteristic value over .
2.
Let A = [    ]
 1  2
 1  3. Then, 2 ±√--
 3 are the roots of the characteristic equation. Hence,
(a)
A has characteristic values or eigenvalues over .
(b)
A has no characteristic value over .

Let us look at some more examples.

Example 6.1.7.

1.
Let A = diag(d1,,dn) with di ,1 i n. Then, p(λ) = i=1n(λ - di) and thus verify that (d1,e1),,(dn,en) are the eigen-pairs.
2.
Let A = (aij) be an n×n triangular matrix. Then, p(λ) = i=1n(λ-aii) and thus verify that σ(A) = {a11,a22,,ann}. What can you say about the eigen-vectors of an upper triangular matrix if the diagonal entries are all distinct?
3.
Let A = [     ]
  1  1
  0  1. Then, p(λ) = (1-λ)2. Hence, σ(A) = {1,1}. But the complete solution of the system (A-I2)x = 0 equals x = ce1, for c . Hence using Remark 6.1.5.2, e1 is an eigenvector. Therefore, 1 is a repeated eigenvalue whereas there is only one eigenvector.
4.
Let A = [     ]
  1  0
  0  1. Then, 1 is a repeated eigenvalue of A. In this case, (A-I2)x = 0 has a solution for every x 2. Hence, any two linearly independent vectors xT ,yT 2 gives (1,x) and (1,y) as the two eigen-pairs for A. In general, if S = {x1,,xn} is a basis of n then (1,x1),,(1,xn) are eigen-pairs of In, the identity matrix. PICT PICT DRAFT
5.
Let A = [     ]
 1  - 1
 1   1. Then, (      [ ])
        i
 1 + i, 1 and (      [ ])
        1
 1 - i, i are the eigen-pairs of A.
6.
Let A = ⌊       ⌋
 0  1  0
|⌈0  0  1|⌉
 0  0  0. Then, σ(A) = {0,0,0} with e 1 as the only eigenvector.
7.
Let A = ⌊        |    ⌋
  0  1 0 |0  0
|| 0  0 1 |0  0||
||        |    ||
|| 0--0-0-|0--0||
⌈ 0  0 0 |0  1⌉
  0  0 0 |0  0. Then, σ(A) = {0,0,0,0,0}. Note that A⌊   ⌋
 x1
||x2 ||
||   ||
||x3 ||
⌈x4 ⌉
 x5 = 0 implies x2 = 0 = x3 = x5. Thus, e1 and e4 are the only eigenvectors. Note that the diagonal blocks of A are nilpotent matrices.

Exercise 6.1.8.

1.
Let A Mn(). Then, prove that
(a)
if α σ(A) then αk σ(Ak), for all k .
(b)
if A is invertible and α σ(A) then αk σ(Ak), for all k .
2.
Find eigen-pairs over , for each of the following matrices:
[           ]
   1   1 + i
 1 - i   1, [            ]
    i    1+ i
 - 1+ i    i, [             ]
  cos θ  - sin θ
  sin θ   cos θ and [             ]
 cos θ   sin θ
  sin θ  - cos θ.
3.
Let A = [aij] Mn() with j=1naij = a, for all 1 i n. Then, prove that a is an eigenvalue of A with corresponding eigenvector 1 = [1,1,,1]T .
4.
Prove that the matrices A and AT have the same set of eigenvalues. Construct a 2 × 2 matrix A such that the eigenvectors of A and AT are different.
5.
Prove that λ is an eigenvalue of A if and only if λ is an eigenvalue of A*. PICT PICT DRAFT
6.
Let A be an idempotent matrix. Then, prove that its eigenvalues are either 0 or 1 or both.
7.
Let A be a nilpotent matrix. Then, prove that its eigenvalues are all 0.
8.
Let J = 11T Mn(). Then, J is a matrix with each entry 1. Show that
(a)
(n,1) is an eigenpair for J.
(b)
0 σ(J) with multiplicity n-1. Find a set of n-1 linearly independent eigenvectors for 0 σ(J).
9.
Let B Mn() and C Mm(). Now, define the Direct Sum B C = [      ]
  B  0
  0  C. Then, prove that
(a)
if (α,x) is an eigen-pair for B then (   [ ])
 α,  x
     0 is an eigen-pair for B C.
(b)
if (β,y) is an eigen-pair for C then (  [  ])
 β,  0
     y is an eigen-pair for B C.

Definition 6.1.9. Let A L(n). Then, a vector y n\{0} satisfying y*A = λy* is called a left eigenvector of A for λ.

Example 6.1.10.

1.
Let A = [        ]
  1    1

 - 1  - 1. Then, x = [ ]
 1

 1 is a left eigenvector of A corresponding to the eigenvalue 0 and (      [   ])
         1
 0,y =  - 1 is a (right) eigenpair of A. PICT PICT DRAFT
2.
Let A = [    ]
 1  1
 2  2. Then, (      [   ])
         1
 0,x =   - 1 and (      [  ])
         1
 3,y =   2 are (right) eigen-pairs of A. Also, (      [  ])
 3,u =  1
        1 and (       [   ])
  0,v =   2
         - 1 are left eigen-pairs of A. Note that x is orthogonal to u and y is orthogonal to v. This is true in general and is proved next.
3.
Let S be a nonsingular matrix such that its columns are left eigenvectors of A. Then, prove that the columns of (S*)-1 are right eigenvectors of A.

Theorem 6.1.11. [Principle of bi-orthogonality] Let (λ,x) be a (right) eigenpair and (μ,y) be a left eigenpair of A, where λμ. Then, y is orthogonal to x.

Proof. Verify that μy*x = (y*A)x = y*(λx) = λy*x. Thus, y*x = 0. _

Exercise 6.1.12. Let Ax = λx and x*A = μx*. Then μ = λ.

Definition 6.1.13. [Eigenvalues of a linear Operator] Let T L(n). Then, α C is called an eigenvalue of T if there exists v n with v0 such that T(v) = αv.

Proposition 6.1.14. Let T L(n) and let B be an ordered basis in n. Then, (α,v) is an eigenpair for T if and only if (α,[v]B) is an eigenpair of A = T[B,B]. PICT PICT DRAFT

Proof. Note that, by definition, T(v) = αv if and only if [Tv]B = [αv]B. Or equivalently, α σ(T) if and only if A[v]B = α[v]B. Thus, the required result follows. _

Remark 6.1.15. [A linear operator on an infinite dimensional space may not have any eigenvalue] Let V be the space of all real sequences (see Example 3.1.4.8a). Now, define a linear operator T L(V) by

T (a ,a ,...) = (0,a ,a ,...).
   0  1           1  2
We now show that T doesn’t have any eigenvalue.

Solution: Let if possible α be an eigenvalue of T with corresponding eigenvector x = (x1,x2,). Then, the eigen-condition T(x) = αx implies that

(0,x1,x2,...) = α(x1,x2,...) = (αx1,αx2,...).
So, if α0 then x1 = 0 and this in turn implies that x = 0, a contradiction. If α = 0 then (0,x1,x2,) = (0,0,) and we again get x = 0, a contradiction. Hence, the required result follows.

Theorem 6.1.16. Let λ1,n, not necessarily distinct, be the A = [aij] Mn(). Then, det(A) = i=1nλi and tr(A) = i=1naii = i=1nλi. PICT PICT DRAFT

Proof. Since λ1,n are the eigenvalues of A, by definition,

                    ∏n
det(A - xIn) = (- 1)n  (x- λi)
                    i=1
(6.1.2)

is an identity in x as polynomials. Therefore, by substituting x = 0 in Equation (6.1.2), we get det(A) = (-1)n(-1)n i=1nλi = i=1nλi. Also,

                 ⌊                            ⌋
                  a11 - x    a12    ⋅⋅⋅   a1n
                 ||  a21    a22 - x ⋅⋅⋅   a2n  ||
det(A - xIn ) =   ||   .        .    .       .  ||                   (6.1.3)
                 ⌈   ..        ..     ..     ..  ⌉
                    an1     an2    ⋅⋅⋅  ann - x
                                    n- 1 n-1           n n
             =   a0 - xa1 + ⋅⋅⋅+ (- 1)  x   an-1 + (- 1) x        (6.1.4)
for some a0,a1,,an-1 . Then, an-1, the coefficient of (-1)n-1xn-1, comes from the term
(a11 - x)(a22 - x)⋅⋅⋅(ann - x).
So, an-1 = i=1naii = tr(A), the trace of A. Also, from Equation (6.1.2) and (6.1.4), we have PICT PICT DRAFT
                                                  ∏n
a0 - xa1 + ⋅⋅⋅+ (- 1)n- 1xn-1an-1 + (- 1)nxn = (- 1)n (x - λi).
                                                  i=1
Therefore, comparing the coefficient of (-1)n-1xn-1, we have
                    {     ∑n   }    ∑n
tr(A ) = an-1 = (- 1) (- 1)  λi   =    λi.
                          i=1       i=1
Hence, we get the required result. _

Exercise 6.1.17.

1.
Let A be a 3 × 3 orthogonal matrix (AAT = I). If det(A) = 1, then prove that there exists v 3 \{0} such that Av = v.
2.
Let A M2n+1() with AT = -A. Then, prove that 0 is an eigenvalue of A.
3.
Let A Mn(). Then, A is invertible if and only if 0 is not an eigenvalue of A.
4.
Let A Mn() satisfy Ax∥≤∥xfor all x n. Then, prove that if α with |α| > 1 then A - αI is invertible.
PICT PICT DRAFT

6.1.1 Spectrum of a Matrix

Definition 6.1.18. [Algebraic, Geometric Multiplicity] Let A Mn(). Then,

1.
the multiplicity of α σ(A) is called the algebraic multiplicity of A, denoted Alg.Mulα(A).
2.
for α σ(A), dim(Null(A - αI)) is called the geometric multiplicity of A, Geo.Mulα(A).

We now state the following observations.

Remark 6.1.19. Let A Mn().

1.
Then, for each α σ(A), using Theorem 2.2.40 dim(Null(A - αI)) 1. So, we have at least one eigenvector.
2.
If the algebraic multiplicity of α σ(A) is r 2 then the Example 6.1.7.7 implies that we need not have r linearly independent eigenvectors.

Theorem 6.1.20. Let A and B be two similar matrices. Then,

1.
α σ(A) if and only if α σ(B). PICT PICT DRAFT
2.
for each α σ(A), Alg.Mulα(A) = Alg.Mulα(B) and Geo.Mulα(A) = Geo.Mulα(B).

Proof. Since A and B are similar, there exists an invertible matrix S such that A = SBS-1. So, α σ(A) if and only if α σ(B) as

                        - 1           (           -1)
det(A - xI)  =  det(SBS    - xI) = det S (B  - xI)S
             =  det(S)det(B - xI )det(A-1) = det(B - xI).        (6.1.5)
Note that Equation (6.1.5) also implies that Alg.Mulα(A) = Alg.Mulα(B). We will now show that Geo.Mulα(A) = Geo.Mulα(B).

So, let Q1 = {v1,,vk} be a basis of Null(A - αI). Then, B = SAS-1 implies that Q2 = {Sv1,,Svk}⊆ Null(B -αI). Since Q1 is linearly independent and S is invertible, we get Q2 is linearly independent. So, Geo.Mulα(A) Geo.Mulα(B). Now, we can start with eigenvectors of B and use similar arguments to get Geo.Mulα(B) Geo.Mulα(A) and hence the required result follows. _

Remark 6.1.21.

1.
Let A = S-1BS. Then, from the proof of Theorem 6.1.20, we see that x is an eigenvector of A for λ if and only if Sx is an eigenvector of B for λ.
2.
Let A and B be two similar matrices then σ(A) = σ(B). But, the converse is not true. For example, take A = [    ]
 0  0

 0  0 and B = [    ]
 0  1

 0  0.
3.
Let A Mn(). Then, for any invertible matrix B, the matrices AB and BA = B(AB)B-1 are similar. Hence, in this case the matrices AB and BA have
(a)
the same set of eigenvalues. PICT PICT DRAFT
(b)
Alg.Mulα(AB) = Alg.Mulα(BA), for each α σ(A).
(c)
Geo.Mulα(AB) = Geo.Mulα(BA), for each α σ(A).

We will now give a relation between the geometric multiplicity and the algebraic multiplicity.

Theorem 6.1.22. Let A Mn(). Then, for α σ(A), Geo.Mulα(A) Alg.Mulα(A).

Proof. Let Geo.Mulα(A) = k. Suppose Q1 = {v1,,vk} is an orthonormal basis of Null(A - αI). Extend Q1 to get {v1,,vk,vk+1,,vn} as an orthonormal basis of n. Put P = [v1,...,vk, vk+1,...,vn]. Then, P* = P-1 and

P*AP   =   P* [Av1, ...,Avk,Avk+1, ...,Avn ]
           ⌊     ⌋                        ⌊          |        ⌋
              v *1                          α  ⋅⋅⋅  0 |*  ⋅⋅⋅ *
           ||    ..||                        ||   ..     |        ||
           ||    .||                        ||0    .  0 |*  ⋅⋅⋅ *||
           ||  v *k||                        ||0--⋅⋅⋅--α-|*--⋅⋅⋅-*||
       =   |v *  |[αv1,...,αvk, *,...,*] = |0  ⋅⋅⋅  0 |*  ⋅⋅⋅ *| .
           ||  k+1.||                        || .        |        ||
           |⌈    ..|⌉                        |⌈ ..        |        |⌉
                *                                    |
              vn                           0  ⋅⋅⋅  0 |*  ⋅⋅⋅ *
Now, if we denote the lower diagonal submatrix as D then
PICT PICT DRAFT                            *                   k
PA (x) = det(A - xI ) = det(P AP - xI) = (α - x) det(D - xI).
(6.1.6)

So, Alg.Mulα(A) = Alg.Mulα(P*AP) k = Geo.Mulα(A). _

Remark 6.1.23. Note that in the proof of Theorem 6.1.22, the remaining eigenvalues of A are the eigenvalues of D (see Equation (6.1.6)). This technique is called deflation.

Exercise 6.1.24.

1.
Let A = ⌊       ⌋
 1  2  3
|⌈3  2  1|⌉

 2  3  1. Notice that x 1 = 1√--
 31 is an eigenvector for A. Find an ordered basis {x1,x2,x3} of 3. Put X = [          ]
 x1  x2  x3. Compute X-1AX to get a block-triangular matrix. Can you now find the remaining eigenvalues of A?
2.
Let A Mm×n() and B Mn×m().
(a)
If α σ(AB) and α0 then
i.
α σ(BA).
ii.
Alg.Mulα(AB) = Alg.Mulα(BA).
iii.
Geo.Mulα(AB) = Geo.Mulα(BA).
(b)
If 0 σ(AB) and n = m then Alg.Mul0(AB) = Alg.Mul0(BA) as there are n eigenvalues, counted with multiplicity.
(c)
Give an example to show that Geo.Mul0(AB) need not equal Geo.Mul0(BA) even when n = m.
PICT PICT DRAFT
3.
Let A Mn() be an invertible matrix and let x,y n with x0 and yT A-1x0. Define B = xyT A-1. Then, prove that
(a)
λ0 = yT A-1x is an eigenvalue of B of multiplicity 1.
(b)
0 is an eigenvalue of B of multiplicity n - 1 [Hint: Use Exercise 6.1.24.2a].
(c)
1 + αλ0 is an eigenvalue of I + αB of multiplicity 1, for any α .
(d)
1 is an eigenvalue of I + αB of multiplicity n - 1, for any α .
(e)
det(A + αxyT ) equals (1 + αλ0)det(A), for any α . This result is known as the Shermon-Morrison formula for determinant.
4.
Let A,B M2() such that det(A) = det(B) and tr(A) = tr(B).
(a)
Do A and B have the same set of eigenvalues?
(b)
Give examples to show that the matrices A and B need not be similar.
5.
Let A,B Mn(). Also, let (λ1,u) and (λ2,v) are eigen-pairs of A and B, respectively.
(a)
If u = αv for some α then (λ1 + λ2,u) is an eigen-pair for A + B.
(b)
Give an example to show that if u and v are linearly independent then λ1 + λ2 need not be an eigenvalue of A + B.
6.
Let A Mn() be an invertible matrix with eigen-pairs (λ1,u1),,(λn,un). Then, prove that B = [u1,,un] forms a basis of n. If [b]B = (c1,,cn)T then the system Ax = b has the unique solution
x =  c1u1 + c2u2 + ⋅⋅⋅+ -cnun.
     λ1     λ2          λn

6.2 Diagonalization

Let A Mn() and let T L(n) be defined by T(x) = Ax, for all x n. In this section, we first find conditions under which one can obtain a basis B of n such that T[B,B] (see Theorem 4.4.4) is a diagonal matrix. And, then it is shown that normal matrices satisfy the above conditions. To start with, we have the following definition.

Definition 6.2.1. [Matrix Diagonalizability] A matrix A is said to be diagonalizable if A is similar to a diagonal matrix. Or equivalently, P-1AP = D AP = PD, for some diagonal matrix D and invertible matrix P.

Example 6.2.2.

1.
Let A be an n × n diagonalizable matrix. Then, by definition, A is similar to a diagonal matrix, say D = diag(d1,,dn). Thus, by Remark 6.1.21, σ(A) = σ(D) = {d1,,dn}.
2.
Let A = [    ]
 0  1
 0  0. Then, A cannot be diagonalized.
Solution: Suppose A is diagonalizable. Then, A is similar to D = diag(d1,d2). Thus, by Theorem 6.1.20, {d1,d2} = σ(D) = σ(A) = {0,0}. Hence, D = 0 and therefore, A = SDS-1 = 0, a contradiction.
3.
Let A = ⌊       ⌋
 2  1  1
|⌈0  2  1|⌉
 0  0  2. Then, A cannot be diagonalized.
Solution: Suppose A is diagonalizable. Then, A is similar to D = diag(d1,d2,d3). Thus, by Theorem 6.1.20, {d1,d2,d3} = σ(D) = σ(A) = {2,2,2}. Hence, D = 2I3 and therefore, A = SDS-1 = 2I3, a contradiction. PICT PICT DRAFT
4.
Let A = [        ]
   0   1
  - 1  0. Then, (  [ ] )
    i
 i, 1 and (    [   ])
       - i
  - i,  1 are two eigen-pairs of A. Define U = √1-
 2[       ]
  i  - i
  1   1. Then, U*U = I2 = UU* and U*AU = [        ]
   - i 0
   0   i.

Theorem 6.2.3. Let A Mn().

1.
Let S be an invertible matrix such that S-1AS = diag(d1,,dn). Then, for 1 i n, the i-th column of S is an eigenvector of A corresponding to di.
2.
Then, A is diagonalizable if and only if A has n linearly independent eigenvectors.

Proof. Let S = [u1,...,un ]. Then, AS = SD gives

[Au1,...,Aun ] = A [u1,...,un] = AS = SD = S diag(d1,...,dn) = [d1u1,...,dnun].
Or equivalently, Aui = diui, for 1 i n. As S is invertible, {u1,,un} are linearly independent. Hence, (di,ui), for 1 i n, are eigen-pairs of A. This proves Part 1 and “only if” part of Part 2.

Conversely, let {u1,,un} be n linearly independent eigenvectors of A corresponding to eigenvalues α1,n. Then, by Corollary 3.3.10, S = [u1,...,un ] is non-singular and

                                                      ⌊           ⌋
                                                       α1   0   0
                                                      ||0   α2   0 ||
AS   =   [Au1, ...,Aun ] = [α1u1,...,λnun] = [u1,...,un ]|| . .    . || = SD,
                                                      ⌈ ..   ..  .. ⌉
                                                       0    0  α
                                                                 n
where D = diag(α1,n). Therefore, S-1AS = D and hence A is diagonalizable. _

Definition 6.2.4.

1.
A matrix A Mn() is called defective if for some α σ(A), Geo.Mulα(A) < Alg.Mulα(A).
2.
A matrix A Mn() is called non-derogatory if Geo.Mulα(A) = 1, for each α σ(A).

As a direct consequence of Theorem 6.2.3, we obtain the following result.

Corollary 6.2.5. Let A Mn(). Then,

1.
A is non-defective if and only if A is diagonalizable.
2.
A has distinct eigenvalues if and only if A is non-derogatory and non-defective.

Theorem 6.2.6. Let (α1,v1),,(αk,vk) be k eigen-pairs of A Mn() with αi’s distinct. Then, {v1,,vk} is linearly independent. PICT PICT DRAFT

Proof. Suppose {v1,,vk} is linearly dependent. Then, there exists a smallest ∈{1,,k - 1} and β0 such that v+1 = β1v1 + ⋅⋅⋅ + βv. So,

αℓ+1v ℓ+1 = αℓ+1β1v1 + ⋅⋅⋅+ αℓ+1βℓvℓ.
(6.2.1)

and

αℓ+1vℓ+1  =  Av ℓ+1 = A (β1v1 + ⋅⋅⋅+ βℓvℓ) = α1β1v1 + ⋅⋅⋅+ αℓβℓvℓ.    (6.2.2)
Now, subtracting Equation (6.2.2) from Equation (6.2.1), we get
0 = (αℓ+1 - α1 )β1v1 + ⋅⋅⋅+ (α ℓ+1 - α ℓ)β ℓv ℓ.
So, v LS(v1,,v-1), a contradiction to the choice of . Thus, the required result follows. _

An immediate corollary of Theorem 6.2.3 and Theorem 6.2.6 is stated next without proof.

PICT PICT DRAFT Corollary 6.2.7. Let A Mn() have n distinct eigenvalues. Then, A is diagonalizable.

The converse of Theorem 6.2.6 is not true as In has n linearly independent eigenvectors corresponding to the eigenvalue 1, repeated n times.

Corollary 6.2.8. Let α1,k be k distinct eigenvalues A Mn(). Also, for 1 i k, let dim(Null(A - αiIn)) = ni. Then, A has i=1kni linearly independent eigenvectors.

Proof. For 1 i k, let Si = {ui1,,uini} be a basis of Null(A-αiIn). Then, we need to prove that i=1kSi is linearly independent. To do so, denote pj(A) = ( k∏           )
    (A - αiIn)
 i=1(A - αjIn), for 1 j k. Then, note that pj(A) is a polynomial in A of degree k - 1 and

          (
          { 0,∏             if u ∈ Null (A - αiIn), for some i ⁄= j
pj(A )u =  (    (αj - αi)u  if u ∈ Null (A - αjIn)
            i⁄=j
(6.2.3)

So, to prove that i=1kSi is linearly independent, consider the linear system

c  u  + ⋅⋅⋅+ c   u   +  ⋅⋅⋅+  c u   + ⋅⋅⋅+ c   u   =  0
 11 11        1n1 1n1        k1  k1         knk knk
in the variables cij’s. Now, applying the matrix pj(A) and using Equation (6.2.3), we get PICT PICT DRAFT
∏
   (αj - αi)(cj1uj1 + ⋅⋅⋅+ cjnjujnj) = 0.
i⁄=j
But ij(αj - αi)0 as αi’s are distinct. Hence, cj1uj1 + ⋅⋅⋅ + cjnjujnj = 0. As Sj is a basis of Null(A - αjIn), we get cjt = 0, for 1 t nj. Thus, the required result follows. _

Corollary 6.2.9. Let A Mn() with distinct eigenvalues α1,k. Then, A is diagonalizable if and only if Geo.Mulαi(A) = Alg.Mulαi(A), for each 1 i k.

Proof. Let Alg.Mulαi(A) = mi. Then, i=1kmi = n. Let Geo.Mulαi(A) = ni, for 1 i k. Then, by Corollary 6.2.8 A has i=1kni linearly independent eigenvectors. Also, by Theorem 6.1.22, ni mi, for 1 i mi.

Now, let A be diagonalizable. Then, by Theorem 6.2.3, A has n linearly independent eigenvectors. So, n = i=1kni. As ni mi and i=1kmi = n, we get ni = mi.

Now, assume that Geo.Mulαi(A) = Alg.Mulαi(A), for 1 i k. Then, for each i,1 i n, A has ni = mi linearly independent eigenvectors. Thus, A has i=1kni = i=1kmi = n linearly independent eigenvectors. Hence by Theorem 6.2.3, A is diagonalizable. _

Example 6.2.10. Let A = ⌊           ⌋
  2   1   1
|⌈ 1   2   1 |⌉

  0  - 1  1. Then, (   ⌊   ⌋)
      1
|(1, |⌈ 0 |⌉|)

     - 1 and (   ⌊   ⌋)
      1
|(2, |⌈ 1 |⌉|)

     - 1 are the only eigen-pairs. Hence, by Theorem 6.2.3, A is not diagonalizable.

Exercise 6.2.11. PICT PICT DRAFT

1.
Let A be diagonalizable. Then, prove that A + αI is diagonalizable for every α .
2.
Let A be an strictly upper triangular matrix. Then, prove that A is not diagonalizable.
3.
Let A be an n×n matrix with λ σ(A) with alg.mulλ(A) = m. If Rank[A-λI]n-m then prove that A is not diagonalizable.
4.
If σ(A) = σ(B) and both A and B are diagonalizable then prove that A is similar to B. That is, they are two basis representation of the same linear transformation.
5.
Let A and B be two similar matrices such that A is diagonalizable. Prove that B is diagonalizable.
6.
Let A Mn() and B Mm(). Suppose C = [     ]
 A   0

 0  B. Then, prove that C is diagonalizable if and only if both A and B are diagonalizable.
7.
Is the matrix A = ⌊        ⌋
  2  1  1
|⌈ 1  2  1|⌉

  1  1  2 diagonalizable?
8.
Let Jn be an n×n matrix with all entries 1. Then, Geo.Mul1(Jn) = Alg.Mul1(Jn) = 1 and Geo.Mul0(Jn) = Alg.Mul0(Jn) = n - 1.
9.
Let A = [aij] Mn(), where aij = a, if i = j and b, otherwise. Then, verify that A = (a - b)In + bJn. Hence, or otherwise determine the eigenvalues and eigenvectors of Jn. Is A diagonalizable?
10.
Let T : 5-→5 be a linear operator with Rank(T - I) = 3 and
                                5
Null (T) = {(x1,x2,x3,x4,x5) ∈ ℝ | x1 + x4 + x5 = 0,x2 + x3 = 0}.
PICT PICT DRAFT
(a)
Determine the eigenvalues of T?
(b)
For each distinct eigenvalue α of T, determine Geo.Mulα(T).
(c)
Is T diagonalizable? Justify your answer.
11.
Let A Mn() with A0 but A2 = 0. Prove that A cannot be diagonalized.
12.
Are the following matrices diagonalizable?
i)⌊            ⌋
  1  3  2   1
|| 0  2  3   1||
||            ||
⌈ 0  0  - 1 1⌉
  0  0  0   4, ii)⌊         ⌋
  1  0  - 1
|⌈ 0  0  1 |⌉

  0  2  0, iii)⌊         ⌋
 1  - 3  3
|⌈0  - 5  6|⌉

 0  - 3  4 and iv)[2   i]

  i  0.
13.
Let A Mn().
(a)
Then, prove that Rank(A) = 1 if and only if A = xy*, for some non-zero vectors x,y n.
(b)
If Rank(A) = 1 then
i.
A has at most one nonzero eigenvalue of algebraic multiplicity 1.
ii.
find this eigenvalue and its geometric multiplicity.
iii.
when is A diagonalizable?
14.
Let A Mn(). If Rank(A) = k then there exists xi,yi n such that A = i=1kxiyi*. Is the converse true?

6.2.1 Schur’s Unitary Triangularization

We now prove one of the most important results in diagonalization, called the Schur’s Lemma or the Schur’s unitary triangularization.

PICT PICT DRAFT Lemma 6.2.12 (Schur’s unitary triangularization (SUT)). Let A Mn(). Then, there exists a unitary matrix U such that A is similar to an upper triangular matrix. Further, if A Mn() and σ(A) have real entries then U is a real orthogonal matrix.

Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let n > 1 and assume the result to be true for k < n and prove it for n.

Let (λ1,x1) be an eigen-pair of A with x1= 1. Now, extend it to form an orthonormal basis {x1,x2,,un} of n and define X = [x1,x2,...,un ]. Then, X is a unitary matrix and

                                 ⌊x*⌋
                                 | 1*|                       [     ]
  *        *                     ||x2||                        λ1  *
X  AX  = X  [Ax1, Ax2, ...,Axn ] = | ..| [λ1x1,Ax2, ...,Axn ] =  0   B  ,    (6.2.4)
                                 ⌈ .⌉
                                  x*n
where B Mn-1(). Now, by induction hypothesis there exists a unitary matrix U Mn-1() such that U*BU = T is an upper triangular matrix. Define ^
U = X[     ]
 1  0
 0  U. Then, using Exercise 5.4.8.10, the matrix ^
U is unitary and
              [      ]      [     ]   [      ] [     ] [     ]
(  )*          1   0    *     1  0      1  0    λ1  *   1  0
 U^  A ^U   =        * X  AX         =       *
              [0  U     ][    0] U [    0  U  ] 0 [ B   0] U
               λ1    *    1  0      λ1    *        λ1  *
           =         *           =        *     =         .
                0  U  B   0  U      0   U  BU       0  T
Since T is upper triangular, [     ]
 λ1  *
 0   T is upper triangular. PICT PICT DRAFT

Further, if A Mn() and σ(A) has real entries then x1 n with Ax1 = λ1x1. Now, one uses induction once again to get the required result. _

Remark 6.2.13. Let A Mn(). Then, by Schur’s Lemma there exists a unitary matrix U such that U*AU = T = [tij], a triangular matrix. Thus,

{α  ,...,α  } = σ (A) = σ(U*AU ) = {t ,...,t  }.
   1      n                        11     nn
(6.2.5)

Furthermore, we can get the αi’s in the diagonal of T in any prescribed order.

Definition 6.2.14. [Unitary Equivalence] Let A,B Mn(). Then, A and B are said to be unitarily equivalent/similar if there exists a unitary matrix U such that A = U*BU.

Remark 6.2.15. We know that if two matrices are unitarily equivalent then they are necessarily similar as U* = U-1, for every unitary matrix U. But, similarity doesn’t imply unitary equivalence (see Exercise 6.2.17.6). In numerical calculations, unitary transformations are preferred as compared to similarity transformations due to the following main reasons:

1.
Exercise 5.4.8.5g implies that Ax= x, whenever A is a normal matrix. This need not be true under a similarity change of basis. PICT PICT DRAFT
2.
As U-1 = U*, for a unitary matrix, unitary equivalence is computationally simpler.
3.
Also, computation of “conjugate transpose” doesn’t create round-off error in calculation.

Example 6.2.16. Consider the two matrices A = [      ]
   3  2
 - 1  0 and B = [     ]
  1 1
  0 2. Then, we show that they are similar but not unitarily similar.

Solution: Note that σ(A) = σ(B) = {1,2}. As the eigenvalues are distinct, by Theorem 6.2.7, the matrices A and B are diagonalizable and hence there exists invertible matrices S and T such that A = SΛS-1, B = TΛT-1, where Λ = [    ]
 1  0
 0  2. Thus, A = ST-1B(ST-1)-1. That is, A and B are similar. But, |aij|2 |bij|2 and hence by Exercise 5.4.8.11, they cannot be unitarily similar.

Exercise 6.2.17.

1.
If A is unitarily similar to an upper triangular matrix T = [tij] then prove that i<j|tij|2 = tr(A*A) - |λi|2.
2.
Use the exercises given below to conclude that the upper triangular matrix obtained in the “Schur’s Lemma” need not be unique.
(a)
Prove that B = ⌊         √--⌋
 2  - 1  3 2
|⌈0   1   √2--|⌉

 0   0    3 and C = ⌊       √ --⌋
 2  1  3  2
|⌈0  1  - √2-|⌉

 0  0    3 are unitarily equivalent.
(b)
Prove that D = ⌊2  0  3√2-⌋
|       √--|
⌈1  1    2 ⌉
 0  0    1 and E = ⌊  2  0   3√2-⌋
|          √ -|
⌈ - 1 1  -   2⌉
   0  0    1 are unitarily equivalent.
(c)
Let A1 = ⌊       ⌋
 2  1  4
|⌈0  1  2|⌉
 0  0  1 and A 2 = ⌊        ⌋
  1  1  4
|⌈ 0  2  2|⌉
  0  0  3. Then, prove that PICT PICT DRAFT
i.
A1 and D are unitarily equivalent.
ii.
A2 and B are unitarily equivalent.
iii.
Do the above results contradict Exercise 5.4.8.5c? Give reasons for your answer.
3.
Prove that A = ⌊1  1  1⌋
|       |
⌈0  2  1⌉
 0  0  3 and B = ⌊2  - 1  √2⌋
|          |
⌈0   1    0⌉
 0   0    3 are unitarily equivalent.
4.
Let A be a normal matrix. If all the eigenvalues of A are 0 then prove that A = 0. What happens if all the eigenvalues of A are 1?
5.
Let A Mn(). Then, Prove that if x*Ax = 0, for all x n, then A = 0. Do these results hold for arbitrary matrices?
6.
Show that the matrices A = [     ]
  4  4
  0  4 and B = [       ]
 10   9
 - 4  - 2 are similar. Is it possible to find a unitary matrix U such that A = U*BU?

We now use Lemma 6.2.12 to give another proof of Theorem 6.1.16.

Corollary 6.2.18. Let A Mn(). If σ(A) = {α1,n} then det(A) = i=1nαi and tr(A) = i=1nαi.

Proof. By Schur’s Lemma there exists a unitary matrix U such that U*AU = T = [tij], a triangular matrix. By Remark 6.2.13, σ(A) = σ(T). Hence, det(A) = det(T) = i=1ntii = i=1nαi and tr(A) = tr(A(UU*)) = tr(U*(AU)) = tr(T) = i=1ntii = i=1nαi. _

6.2.2 Diagonalizability of some Special Matrices

We now use Schur’s unitary triangularization Lemma to state the main theorem of this subsection. Also, recall that A is said to be a normal matrix if AA* = A*A.

PICT PICT DRAFT

Theorem 6.2.19 (Spectral Theorem for Normal Matrices). Let A Mn(). If A is a normal matrix then there exists a unitary matrix U such that U*AU = diag(α1,n).

Proof. By Schur’s Lemma there exists a unitary matrix U such that U*AU = T = [tij], a triangular matrix. Since A is a normal

 *       *    *  *         * *        *   *       *      *    *      *
T T =  (U  AU ) (U AU ) = U  A AU  = U  AA  U = (U AU  )(U  AU ) =  TT .
Thus, we see that T is an upper triangular matrix with T*T = TT*. Thus, by Exercise 1.2.11.4, T is a diagonal matrix and this completes the proof. _

Exercise 6.2.20. Let A Mn(). If A is either a Hermitian, skew-Hermitian or Unitary matrix then A is a normal matrix.

We re-write Theorem 6.2.19 in another form to indicate that A can be decomposed into linear combination of orthogonal projectors onto eigen-spaces. Thus, it is independent of the choice of eigenvectors.

Remark 6.2.21. Let A Mn() be a normal matrix with eigenvalues α1,n.

1.
Then, there exists a unitary matrix U = [u1,...,un] such that
(a)
In = u1u1* + ⋅⋅⋅ + unun*.
(b)
the columns of U form a set of orthonormal eigenvectors for A (use Theorem 6.2.3). PICT PICT DRAFT
(c)
A = A In = A(u1u *+ ⋅⋅⋅+ unu *)
     1           n = α1u1u1* + ⋅⋅⋅ + αnunun*.
2.
Let α1,k be the distinct eigenvalues of A. Also, let Wi = Null(A-αiIn), for 1 i k, be the corresponding eigen-spaces.
(a)
Then, we can group the ui’s such that they form an orthonormal basis of Wi, for 1 i k. Hence, n = W1 ⋅⋅⋅Wk.
(b)
If Pαi is the orthogonal projector onto Wi, for 1 i k then A = α1P1+⋅⋅⋅+αkPk. Thus, A depends only on eigen-spaces and not on the computed eigenvectors.

We now give the spectral theorem for Hermitian matrices.

Theorem 6.2.22. [Spectral Theorem for Hermitian Matrices] Let A Mn() be a Hermitian matrix. Then,

1.
the eigenvalues αi, for 1 i n, of A are real.
2.
there exists a unitary matrix U, say U = [u1,...,un] such that
(a)
In = u1u1* + ⋅⋅⋅ + unun*.
(b)
{u1,,un} forms a set of orthonormal eigenvectors for A.
(c)
A = α1u1u1* + ⋅⋅⋅ + αnunun*, or equivalently, U*AU = D, where D = diag(α1,n).

Proof. The second part is immediate from Theorem 6.2.19 as Hermitian matrices are also normal matrices. For Part 1, let (α,x) be an eigen-pair. Then, Ax = αx. As A is Hermitian A* = A. Thus, x*A = x*A* = (Ax)* = (αx)* = αx*. Hence, using x*A = αx*, we get PICT PICT DRAFT

   *     *        *         *       -- *     --*
αx x = x  (αx ) = x (Ax) = (x A )x = (αx  )x = αx x.
As x is an eigenvector, x0. Hence, x2 = x*x0. Thus α = α, i.e., α . _

As an immediate corollary of Theorem 6.2.22 and the second part of Lemma 6.2.12, we give the following result without proof.

Corollary 6.2.23. Let A Mn() be symmetric. Then, A = U diag(α1,n) U*, where

1.
the αi’s are all real,
2.
the columns of U can be chosen to have real entries,
3.
the eigenvectors that correspond to the columns of U form an orthonormal basis of n.

Exercise 6.2.24.

1.
Let A be a skew-symmetric matrix. Then, the eigenvalues of A are either zero or purely imaginary and A is unitarily diagonalizable.
2.
Let A be a skew-Hermitian matrix. Then, A is unitarily diagonalizable.
3.
Characterize all normal matrices in M2().
4.
Let σ(A) = {λ1,n}. Then, prove that the following statements are equivalent. PICT PICT DRAFT
(a)
A is normal.
(b)
A is unitarily diagonalizable.
(c)
i,j|aij|2 = i|λi|2.
(d)
A has n orthonormal eigenvectors.
5.
Let A be a normal matrix with (λ,x) as an eigen-pair. Then,
(a)
(A*)kx for k + is also an eigenvector corresponding to λ.
(b)
(λ,x) is an eigen-pair for A*. [Hint: Verify A*x -λx2 = Ax - λx2.]
6.
Let A be an n × n unitary matrix. Then,
(a)
|λ| = 1 for any eigenvalue λ of A.
(b)
the eigenvectors x,y corresponding to distinct eigenvalues are orthogonal.
7.
Let A be a 2 × 2 orthogonal matrix. Then, prove the following:
(a)
if det(A) = 1 then A = [            ]
 cosθ  - sin θ
 sin θ   cosθ, for some θ,0 θ < 2π. That is, A counterclockwise rotates every point in 2 by an angle θ.
(b)
if detA = -1 then A = [            ]
 cosθ   sin θ
 sin θ  - cosθ, for some θ,0 θ < 2π. That is, A reflects every point in 2 about a line passing through origin. Determine this line. Or equivalently, there exists a non-singular matrix P such that P-1AP = [      ]
  1  0
  0  - 1.
8.
Let A be a 3 × 3 orthogonal matrix. Then, prove the following:
(a)
if det(A) = 1 then A is a rotation about a fixed axis, in the sense that A has an eigen-pair (1,x) such that the restriction of A to the plane x is a two dimensional rotation in x. PICT PICT DRAFT
(b)
if detA = -1 then A corresponds to a reflection through a plane P, followed by a rotation about the line through origin that is orthogonal to P.
9.
Let A be a normal matrix. Then, prove that Rank(A) equals the number of nonzero eigenvalues of A.
10.
[Equivalent characterizations of Hermitian matrices] Let A Mn(). Then, the following statements are equivalent.
(a)
The matrix A is Hermitian.
(b)
The number x*Ax is real for each x n.
(c)
The matrix A is normal and has real eigenvalues.
(d)
The matrix S*AS is Hermitian for each S Mn().

6.2.3 Cayley Hamilton Theorem

Let A Mn(). Then, in Theorem 6.1.16, we saw that

                          n ( n        n-1        n-2            n-1          n  )
PA (x ) = det(A - xI) = (- 1) x - an- 1x   + an -2x   +  ⋅⋅⋅+  (- 1)   a1x + (- 1) a0
(6.2.6)

for certain ai , 0 i n - 1. Also, if α is an eigenvalue of A then PA(α) = 0. So, xn - an-1xn-1 + an-2xn-2 + ⋅⋅⋅ + (-1)n-1a1x + (-1)na0 = 0 is satisfied by n complex numbers. It turns out that the expression PICT PICT DRAFT

An -  an-1An- 1 + an-2An -2 + ⋅⋅⋅+ (- 1)n-1a1A + (- 1)na0I = 0
holds true as a matrix identity. This is a celebrated theorem called the Cayley Hamilton Theorem. We give a proof using Schur’s unitary triangularization. To do so, we look at multiplication of certain upper triangular matrices.

Lemma 6.2.25. Let A1,,An Mn() be upper triangular matrices such that the (i,i)-th entry of Ai equals 0, for 1 i n. Then, A1A2⋅⋅⋅An = 0.

Proof. We use induction to prove that the first k columns of A1A2⋅⋅⋅Ak is 0, for 1 k n. The result is clearly true for k = 1 as the first column of A1 is 0. For clarity, we show that the first two columns of A1A2 is 0. Let B = A1A2. Then, by matrix multiplication

B [:,i] = A1[:,1](A2)1i + A1[:,2](A2)2i + ⋅⋅⋅+ A1 [:,n](A2 )ni = 0 + ⋅⋅⋅+ 0 = 0
as A1[:,1] = 0 and (A2)ji = 0, for i = 1,2 and j 2. So, assume that the first n - 1 columns of C = A1⋅⋅⋅An-1 is 0 and let B = CAn. Then, for 1 i n, we see that
B [:,i] = C[:,1](A  ) + C [:,2](A  )  + ⋅⋅⋅+ C [:,n](A  )  = 0 + ⋅⋅⋅+ 0 = 0
               n 1i          n 2i               n ni
as C[:,j] = 0, for 1 j n - 1 and (An)ni = 0, for i = n - 1,n. Thus, by induction hypothesis the required result follows. _ PICT PICT DRAFT

Exercise 6.2.26. Let A,B Mn() be upper triangular matrices with the top leading principal submatrix of A of size k being 0. If B[k + 1,k + 1] = 0 then prove that the leading principal submatrix of size k + 1 of AB is 0.

We now prove the Cayley Hamilton Theorem using Schur’s unitary triangularization.

Theorem 6.2.27 (Cayley Hamilton Theorem). Let A Mn(). Then, A satisfies its characteristic equation. That is, if PA(x) = det(A-xIn) = a0-xa1+⋅⋅⋅+(-1)n-1an-1xn-1+ (-1)nxn then

An  - an-1An- 1 + an-2An -2 + ⋅⋅⋅+ (- 1)n-1a1A + (- 1)na0I = 0
holds true as a matrix identity.

Proof. Let σ(A) = {α1,n} then PA(x) = i=1n(x-αi). And, by Schur’s unitary triangularization there exists a unitary matrix U such that U*AU = T, an upper triangular matrix with tii = αi, for 1 i n. Now, observe that if Ai = T - αiI then the Ai’s satisfy the conditions of Lemma 6.2.25. Hence,

(T - α1I)⋅⋅⋅(T - αnI ) = 0.
Therefore, PICT PICT DRAFT
        ∏n            ∏n
PA(A ) =   (A - αiI) =   (U TU * - αiU IU*) = U [(T - α1I) ⋅⋅⋅(T - αnI )]U * = U 0U * = 0.
        i=1           i=1
Thus, the required result follows. _

We now give some examples and then implications of the Cayley Hamilton Theorem.

Remark 6.2.28.

1.
Let A = [      ]
 1   2
 1  - 3. Then, PA(x) = x2 + 2x - 5. Hence, verify that
                [ 3   - 4]   [1   2 ]    [1  0]
A2 + 2A -  5I2 =          + 2         - 5       = 0.
                 - 2  11       1  - 3     0  1
Further, verify that A-1 = 1-
5(A +  2I)
      2 = 1-
5[     ]
 3   2
 1  - 1. Furthermore, A2 = -2A + 5I implies that
 3       2                      2
A  = A(A  ) = A (- 2A + 5I) = - 2A + 5I = - 2(- 2A + 5I)+ 5I = 4A - 10I + 5I = 4A - 5I.
                                                                                      <img 
src= PICT DRAFT " class="math-display" >
We can keep using the above technique to get Am as a linear combination of A and I, for all m 1.
2.
Let A = [    ]
 3  1
 2  0. Then, PA(t) = t(t- 3) - 2 = t2 - 3t- 2. So, using PA(A) = 0, we have A-1 = A--3I
  2. Further, A2 = 3A + 2I implies that A3 = 3A2 + 2A = 3(3A + 2I) + 2A = 11A + 6I. So, as above, Am is a combination of A and I, for all m 1.
3.
Let A = [      ]
  0  1
  0  0. Then, PA(x) = x2. So, even though A0, A2 = 0.
4.
For A = ⌊        ⌋
  0  0  1
|⌈ 0  0  0|⌉

  0  0  0, P A(x) = x3. Thus, by the Cayley Hamilton Theorem A3 = 0. But, it turns out that A2 = 0.
5.
For A = ⌊1  0  0⌋
|       |
⌈0  1  1⌉
 0  0  1, note that P A(t) = (t - 1)3. So PA(A) = 0. But, observe that if q(t) = (t - 1)2 then q(A) is also 0.
6.
Let A Mn() with PA(x) = a0 - xa1 + ⋅⋅⋅ + (-1)n-1an-1xn-1 + (-1)nxn.
(a)
Then, for any , the division algorithm gives α01,n-1 and a polynomial f(x) with coefficients from such that
xℓ = f(x)PA(x) + α0 + x α1 + ⋅⋅⋅+ xn -1αn-1.
Hence, by the Cayley Hamilton Theorem, A = α0I + α1A + ⋅⋅⋅ + αn-1An-1.
i.
Thus, to compute any power of A, one needs to apply the division algorithm to get αi’s and know Ai, for 1 i n - 1. This is quite helpful in numerical computation as computing powers takes much more time than division. PICT PICT DRAFT
ii.
Note that LS{I,A,A2, ...} is a subspace of Mn(). Also, dim(Mn (ℂ )) = n2. But, the above argument implies that dim(   {      2   })
LS   I,A,A  ,...n.
iii.
In the language of graph theory, it says the following: “Let G be a graph on n vertices and A its adjacency matrix. Suppose there is no path of length n - 1 or less from a vertex v to a vertex u in G. Then, G doesn’t have a path from v to u of any length. That is, the graph G is disconnected and v and u are in different components of G.”
(b)
Suppose A is non-singular. Then, by definition a0 = det(A)0. Hence,
 - 1   1 [                     n-2      n-2       n-1  n-1]
A   =  a--a1I - a2A + ⋅⋅⋅+ (- 1)  an- 1A    + (- 1)  A     .
        0
This matrix identity can be used to calculate the inverse.
(c)
The above also implies that if A is invertible then A-1 LS{      2    }
 I,A, A ,.... That is, A-1 is a linear combination of the vectors I,A,,An-1.

Exercise 6.2.29. Find the inverse of ⌊       ⌋
 2  3  4
|5  6  7|
⌈       ⌉
 1  1  2, ⌊          ⌋
 - 1 - 1  1
| 1  - 1  1|
⌈          ⌉
  0   1   1 and ⌊            ⌋
  1   - 2 - 1
|- 2  1   - 1|
⌈            ⌉
  0   - 1  2 by the Cayley Hamilton Theorem.

Exercise 6.2.30. Miscellaneous Exercises: PICT PICT DRAFT

1.
Let A,B M2() such that A = AB - BA. Then, prove that A2 = 0.
2.
Let B be an m×n matrix and A = [      ]
  0   B
 BT   0. Then, prove that (   [ ])
     x
  λ, y is an eigen-pair if and only if (     [   ])
        x
  - λ, - y is an eigen-pair.
3.
Let B,C Mn(). Define A = [       ]
  B   C

 - C  B. Then, prove the following:
(a)
if s is a real eigenvalue of A with corresponding eigenvector [ ]
 x
 y then s is also an eigenvalue corresponding to the eigenvector [ - y ]

   x.
(b)
if s+it is a complex eigenvalue of A with corresponding eigenvector [        ]
  x + iy
 - y + ix then s - it is also an eigenvalue of A with corresponding eigenvector [ x - iy ]

 - y - ix.
(c)
(s + it,x + iy) is an eigen-pair of B + iC if and only if (s- it,x - iy ) is an eigen-pair of B - iC.
(d)
(       [        ])
          x + iy
  s+  it, - y + ix is an eigen-pair of A if and only if (s+ it,x + iy ) is an eigen-pair of B + iC.
(e)
det(A) = |det(B + iC)|2.

The next section deals with quadratic forms which helps us in better understanding of conic sections in analytic geometry.

6.3 Quadratic Forms

Definition 6.3.1. [Positive, Semi-positive and Negative definite matrices] Let A Mn(). Then, A is said to be PICT PICT DRAFT

1.
positive semi-definite (psd) if x*Ax and x*Ax 0, for all x n.
2.
positive definite (pd) if x*Ax and x*Ax > 0, for all x n \{0}.
3.
negative semi-definite (nsd) if x*Ax and x*Ax 0, for all x n.
4.
negative definite (nd) if x*Ax and x*Ax < 0, for all x n \{0}.
5.
indefinite if x*Ax and there exist x,y n such that x*Ax < 0 < y*Ay.

Lemma 6.3.2. Let A Mn(). Then A is Hermitian if and only if at least one of the following statements hold:

1.
S*AS is Hermitian for all S Mn.
2.
A is normal and has real eigenvalues.
3.
x*Ax for all x n.

Proof. Let S Mn, (S*AS)* = S*A*S = S*AS. Thus S*AS is Hermitian.

Suppose A = A*. Then, A is clearly normal as AA* = A2 = A*A. Further, if (λ,x) is an eigenpair then λx*x = x*Ax implies λ .

For the last part, note that x*Ax . Thus x*Ax = (x*Ax)* = x*A*x = x*Ax, we get Im(x*Ax) = 0. Thus, x*Ax .

If S*AS is Hermitian for all S Mn then taking S = In gives A is Hermitian.

If A is normal then A = U* diag(λ1,n)U for some unitary matrix U. Since λi , A* = (U* diag(λ1,n)U)* = U* diag(λ1,,λn)U = U* diag(λ1,n)U = A. So, A is Hermitian.

If x*Ax for all x n then aii = ei*Aei . Also, aii + ajj + aij + aji = (ei + ej)*A(ei + ej) . So, Im(aij) = -Im(aji). Similarly, aii + ajj + iaij - iaji = (ei + iej)*A(ei + iej) implies that Re(aij) = Re(aji). Thus, A = A*. _

PICT PICT DRAFT Remark 6.3.3. Let A Mn(). Then the condition x*Ax in Definition 6.3.9 is always true and hence doesn’t put any restriction on the matrix A. So, in Definition 6.3.9, we assume that AT = A, i.e., A is a symmetric matrix.

Example 6.3.4.

1.
Let A = [    ]
 2  1

 1  2 or A = [           ]
   3    1+ i

 1 - i   4. Then, A is positive definite.
2.
Let A = [    ]
 1  1
 1  1 or A = [ √ --      ]
    2  1√+-i
 1 - i    2. Then, A is positive semi-definite but not positive definite.
3.
Let A = [       ]
 - 2  1
  1   - 2 or A = [          ]
  - 2  1-  i
 1+ i   - 2. Then, A is negative definite.
4.
Let A = [- 1  1 ]

  1   - 1 or A = [ - 2  1-  i]

 1+ i   - 1. Then, A is negative semi-definite.
5.
Let A = [      ]
 0   1
 1  - 1 or A = [           ]
   1   1 + i
 1 - i   1. Then, A is indefinite.

Theorem 6.3.5. Let A Mn(). Then, the following statements are equivalent.

1.
A is positive semi-definite.
2.
A* = A and each eigenvalue of A is non-negative.
3.
A = B*B for some B Mn().

Proof. 1 2: Let A be positive semi-definite. Then, by Lemma 6.3.2 A is Hermitian. If (α,v) is an eigen-pair of A then αv2 = v*Av 0. So, α 0. PICT PICT DRAFT

2 3: Let σ(A) = {α1,n}. Then, by spectral theorem, there exists a unitary matrix U such that U*AU = D with D = diag(α1,n). As αi 0, for 1 i n, define D12 = diag(√ α1-,,√ αn-). Then, A = UD1
2[D1
2U*] = B*B.

3 1: Let A = B*B. Then, for x n, x*Ax = x*B*Bx = Bx2 0. Thus, the required result follows. _

A similar argument gives the next result and hence the proof is omitted.

Theorem 6.3.6. Let A Mn(). Then, the following statements are equivalent.

1.
A is positive definite.
2.
A* = A and each eigenvalue of A is positive.
3.
A = B*B for a non-singular matrix B Mn().

Remark 6.3.7. Let A Mn() be a Hermitian matrix with eigenvalues λ1 λ2 ⋅⋅⋅λn. Then, there exists a unitary matrix U = [u1,u2,,un] and a diagonal matrix D = diag(λ12,n) such that A = UDU*. Now, for 1 i n, define αi = max{λi,0} and βi = min{λi,0}. Then

1.
for D1 = diag(α12,n), the matrix A1 = UD1U* is positive semi-definite.
2.
for D2 = diag(β12,n), the matrix A2 = UD2U* is positive semi-definite.
3.
A = A1 - A2. The matrix A1 is generally called the positive semi-definite part of A.

Definition 6.3.8. [Multilinear Function] Let V be a vector space over F. Then, PICT PICT DRAFT

1.
for a fixed m , a function f : Vm F is called an m-multilinear function if f is linear in each component. That is,
f (v1,...,vi-1,(vi + αu ),vi+1 ...,vm) =  f(v1,...,vi- 1,vi,vi+1 ...,vm )
                                            + αf(v ,...,v   ,u,v   ...,v  )
                                                  1      i-1    i+1     m
for α F, u V and vi V, for 1 i m.
2.
An m-multilinear form is also called an m-form.
3.
A 2-form is called a bilinear form.

Definition 6.3.9. [Sesquilinear, Hermitian and Quadratic Forms] Let A = [aij] Mn() be a Hermitian matrix and let x,y n. Then, a sesquilinear form in x,y n is defined as H(x,y) = y*Ax. In particular, H(x,x), denoted H(x), is called a Hermitian form. In case A Mn(), H(x) is called a quadratic form.

Remark 6.3.10. Observe that

1.
if A = In then the bilinear/sesquilinear form reduces to the standard inner product.
2.
H(x,y) is ‘linear’ in the first component and ‘conjugate linear’ in the second component. PICT PICT DRAFT
3.
the quadratic form H(x) is a real number. Hence, for α , the equation H(x) = α, represents a conic in n.

Example 6.3.11.

1.
Let vi n, for 1 i n. Then, f(v1,...,vn) = det([v1,...,vn]) is an n-form on n.
2.
Let A Mn(). Then, f(x,y) = yT Ax, for x,y n, is a bilinear form on n.
3.
Let A = [           ]
   1   2 - i
 2 + i   2. Then, A* = A and for x = [ ]
 x
 y, verify that
         *        2      2             --
H (x) = x Ax  = |x |+  2|y| + 2Re ((2- i)xy)
where ‘Re’ denotes the real part of a complex number, is a sesquilinear form.

6.3.1 Sylvester’s law of inertia

The main idea of this section is to express H(x) as sum or difference of squares. Since H(x) is a quadratic in x, replacing x by cx, for c , just gives a multiplication factor by |c|2. Hence, one needs to study only the normalized vectors. Let us consider Example 6.1.1 again. There we see that

PICT PICT DRAFT  T          (x+-y)2-  (x---y)2          2     2
x Ax   =   3   2    -    2    =  (x + 2y) -  3y , and            (6.3.1)
            (x+ 2y )2     (2x - y)2         2y    50y2
xTBx   =   5---------+ 10--------- = (3x-  --)2 +-----.         (6.3.2)
               5             5             3       9
Note that both the expressions in Equation (6.3.1) is the difference of two non-negative terms. Whereas, both the expressions in Equation (6.3.2) consists of sum of two non-negative terms. Is this just a coincidence?

In general, let A Mn() be a Hermitian matrix. Then, by Theorem 6.2.22, σ(A) = {α1,n}⊆ and there exists a unitary matrix U such that U*AU = D = diag(α1,n). Let x = Uz. Then, x= 1 and U is unitary implies that z= 1. If z = (z1,,zn)* then

         *  *        *     ∑n      2   ∑p  √ --  2   ∑r  ||∘ ---- ||2
H (x) = z U AU  z = z Dz =    αi|zi| =    |  αizi| -      |  |αi|zi| ,
                           i=1         i=1          i=p+1
(6.3.3)

where α1,p > 0, αp+1,r < 0 and αr+1,n = 0. Thus, we see that the possible values of H(x) seem to depend only on the eigenvalues of A. Since U is an invertible matrix, the components zi’s of z = U-1x = U*x are commonly known as the linearly independent linear forms. Note that each zi is a linear expression in the components of x. Also, note that in Equation (6.3.3), p corresponds to the number of positive eigenvalues and r - p to the number of negative eigenvalues. For a better understanding, we define the following numbers.

Definition 6.3.12. [Inertia and Signature of a Matrix] Let A Mn() be a Hermitian matrix. The inertia of A, denoted i(A), is the triplet (i+(A),i-(A),i0(A)), where i+(A) is the number of positive eigenvalues of A, i-(A) is the number of negative eigenvalues of A and i0(A) is the nullity of A. The difference i+(A) - i-(A) is called the signature of A.

PICT PICT DRAFT

Exercise 6.3.13. Let A Mn() be a Hermitian matrix. If the signature and the rank of A is known then prove that one can find out the inertia of A.

As a next result, we show that in any expression of H(x) as a sum or difference of n absolute squares of linearly independent linear forms, the number p (respectively, r - p) gives the number of positive (respectively, negative) eigenvalues of A. This is popularly known as the ‘Sylvester’s law of inertia’.

Lemma 6.3.14. [Sylvester’s Law of Inertia] Let A Mn() be a Hermitian matrix and let x n. Then, every Hermitian form H(x) = x*Ax, in n variables can be written as

           2           2       2           2
H (x) = |y1| + ⋅⋅⋅+ |yp| - |yp+1| - ⋅⋅⋅- |yr |
where y1,,yr are linearly independent linear forms in the components of x and the integers p and r satisfying 0 p r n, depend only on A.

Proof. Equation (6.3.3) implies that H(x) has the required form. We only need to show that p and r are uniquely determined by A. Hence, let us assume on the contrary that there exist p,q,r,s with p > q such that

PICT PICT DRAFT
             2           2        2          2
H(x ) =   |y1 |+  ⋅⋅⋅+  |yp| - |yp+1| - ⋅⋅⋅- |yr|              (6.3.4)
      =   |z |2 + ⋅⋅⋅+ |z |2 - |z  |2 - ⋅⋅⋅- |z |2,              (6.3.5)
           1           q      q+1           s
where y = [  ]
 Y1
 Y2 = Mx, z = [   ]
 Z1
 Z2 = Nx with Y 1 = ⌊  ⌋
|y1|
| ...|
⌈  ⌉
 yp and Z1 = ⌊   ⌋
| z1|
| ... |
⌈   ⌉
  zq for some invertible matrices M and N. Now the invertibility of M and N implies z = By, for some invertible matrix B. Decompose B = [       ]
 B1   B2

 B3   B4, where B1 is a q ×p matrix. Then [   ]
 Z1

 Z2 = [       ]
 B1   B2

 B3   B4[  ]
 Y1

 Y2. As p > q, the homogeneous linear system B1Y 1 = 0 has a nontrivial solution, say ^
Y1 = ⌊  ⌋
 y˜1
|| ..||
⌈ .⌉
 y˜p and consider ^y = [^ ]
 Y1
 0. Then for this choice of ^y, Z1 = 0 and thus, using Equations (6.3.4) and (6.3.5), we have
H (˜y) = |˜y1|2 + |y˜2|2 + ⋅⋅⋅+ |˜yp|2 - 0 = 0 - (|zq+1|2 + ⋅⋅⋅+ |zs|2).

Now, this can hold only if ^Y1 = 0, a contradiction to ^Y1 being a non-trivial solution. Hence p = q. Similarly, the case r > s can be resolved. This completes the proof of the lemma. __

Remark 6.3.15. Since A is Hermitian, Rank(A) equals the number of nonzero eigenvalues. Hence, Rank(A) = r. The number r is called the rank and the number r - 2p is called the inertial degree of the Hermitian form H(x). PICT PICT DRAFT

We now look at another form of the Sylvester’s law of inertia. We start with the following definition.

Definition 6.3.16. [Star Congruence] Let A,B Mn(). Then, A is said to be *-congruent (read star-congruent) to B if there exists an invertible matrix S such that A = S*BS.

Theorem 6.3.17. [Second Version: Sylvester’s Law of Inertia] Let A,B Mn() be Hermitian. Then, A is *-congruent to B if and only if i(A) = i(B).

Proof. By spectral theorem U*AU = ΛA and V *BV = ΛB, for some unitary matrices U,V and diagonal matrices ΛA,ΛB of the form diag(+,⋅⋅⋅,+,-,⋅⋅⋅,-,0,⋅⋅⋅,0). Thus, there exist invertible matrices S,T such that S*AS = DA and T*BT = DB, where DA,DB are diagonal matrices of the form diag(1,⋅⋅⋅,1,-1,⋅⋅⋅,-1,0,⋅⋅⋅,0).

If i(A) = i(B), then it follows that DA = DB, i.e., S*AS = T*BT and hence A = (TS-1)*B(TS-1).

Conversely, suppose that A = P*BP, for some invertible matrix P, and i(B) = (k,l,m). As T*BT = DB, we have, A = P*(T*)-1DBT-1P = (T-1P)*DB(T-1P). Now, let X = (T-1P)-1. Then, A = (X-1)*DBX-1 and we have the following observations.

1.
As rank and nullity do not change under similarity transformation, i0(A) = i0(DB) = m as i(B) = (k,l,m).
2.
Using i(B) = (k,l,m), we also have
         *                       *   - 1*      -1               *
X [:,k + 1]AX  [:,k + 1] = X [:,k + 1] ((X )DB (X   ))X [:,k + 1] = ek+1DBek+1 = - 1.
                                                                                      <img 
src= PICT DRAFT " class="math-display" >
Similarly, X[:,k + 2]*AX[:,k + 2] = ⋅⋅⋅ = X[:,k + l]*AX[:,k + l] = -1. As the vectors X[:,k + 1],,X[:,k + l] are linearly independent, using 9.7.10, we see that A has at least l negative eigenvalues.
3.
Similarly, X[:,1]*AX[:,1] = ⋅⋅⋅ = X[:,k]*AX[:,k] = 1. As X[:,1],,X[:,k] are linearly independent, using 9.7.10 again, we see that A has at least k positive eigenvalues.

Thus, it now follows that i(A) = (k,l,m). _

6.3.2 Applications in Eculidean Plane and Space

We now obtain conditions on the eigenvalues of A, corresponding to the associated quadratic form, to characterize conic sections in 2, with respect to the standard inner product.

Definition 6.3.18. [Associated Quadratic Form] Let f(x,y) = ax2+2hxy+by2+2fx+2gy+c be a general quadratic in x and y, with coefficients from . Then,

                       [    ] [ ]
         T      [     ] a  h   x      2            2
H (x) = x Ax  =  x,  y  h  b   y  = ax  + 2hxy + by
is called the associated quadratic form of the conic f(x,y) = 0.

Proposition 6.3.19. Consider the general quadratic f(x,y), for a,b,c,g,f,h . Then, f(x,y) = 0 represents

1.
an ellipse or a circle if ab - h2 > 0, PICT PICT DRAFT
2.
a parabola or a pair of parallel lines if ab - h2 = 0,
3.
a hyperbola or a pair of intersecting lines if ab - h2 < 0.

Proof. As A is symmetric, by Corollary 6.2.23, A = U diag(α12)UT , where U = [u1,u2] is an orthogonal matrix, with (α1,u1) and (α2,u2) as eigen-pairs of A. Let [u,v] = xT U. As u1 and u2 are orthogonal, u and v represent orthogonal lines passing through origin in the (x,y)-plane. In most cases, these lines form the principal axes of the conic.

We also have xT Ax = α1u2 + α2v2 and hence f(x,y) = 0 reduces to

α u2 + α v2 + 2gu + 2f v + c = 0.
 1      2       1     1
(6.3.6)

for some g1,f1 . Now, we consider different cases depending of the values of α12:

1.
If α1 = 0 = α2 then A = 0 and Equation (6.3.6) gives the straight line 2gx+2fy +c = 0.
2.
if α1 = 0 and α20 then ab-h2 = det(A) = α1α2 = 0. So, after dividing by α2, Equation (6.3.6) reduces to (v + d1)2 = d2u + d3, for some d1,d2,d3 . Hence, let us look at the possible subcases:
(a)
Let d2 = d3 = 0. Then, v + d1 = 0 is a pair of coincident lines.
(b)
Let d2 = 0,d30.
i.
If d3 > 0, then we get a pair of parallel lines given by v = -d1 ±∘ ---
   d3
   α2.
ii.
If d3 < 0, the solution set of the corresponding conic is an empty set.
(c)
If d20. Then, the given equation is of the form Y 2 = 4aX for some translates X = x + α and Y = y + β and thus represents a parabola. PICT PICT DRAFT

Let H(x) = x2 + 4y2 + 4xy be the associated quadratic form for a class of curves. Then, A = [    ]
 1  2

 2  4, α1 = 02 = 5 and v = x + 2y. Now, let d1 = -3 and vary d2 and d3 to get different curves (see Figure 6.2 drawn using the package “MATHEMATICA”).


PIC PIC PIC

Figure 6.2: Curves for d2 = 0 = d3, d2 = 0,d3 = 1 and d2 = 1,d3 = 1

.


3.
α1 > 0 and α2 < 0. Then, ab - h2 = det(A) = λ1λ2 < 0. If α2 = -β2, for β2 > 0, then Equation (6.3.6) reduces to
α (u + d )2 - β (v + d )2 = d , for some d ,d ,d ∈ ℝ
  1     1     2      2     3           1  2  3
(6.3.7)

whose understanding requires the following subcases: PICT PICT DRAFT

(a)
If d3 = 0 then Equation (6.3.7) equals
(                          )  (                         )
  √α--(u + d )+ ∘  β-(v + d )  ⋅ √ α-(u+ d ) - ∘ β-(v + d ) = 0
     1     1       2     2        1     1       2     2
or equivalently, a pair of intersecting straight lines u + d1 = 0 and v + d2 = 0 in the (u,v)-plane.
(b)
Without loss of generality, let d3 > 0. Then, Equation (6.3.7) equals
λ1(u + d1)2   α2 (v + d2)2
------------ ----------- = 1
    d3            d3
or equivalently, a hyperbola with orthogonal principal axes u+d1 = 0 and v+d2 = 0.

Let H(x) = 10x2 - 5y2 + 20xy be the associated quadratic form for a class of curves. Then, A = [       ]
 10  10
 10  - 5, α1 = 152 = -10 and √ --
  5u = 2x + y,√ --
  5v = x - 2y. Now, let d1 = √ --
  5,d2 = -√ --
  5 to get 3(2x + y + 1)2 - 2(x - 2y - 1)2 = d3. Now vary d3 to get different curves (see Figure 6.3 drawn using the package ”MATHEMATICA”).


PICT PICT DRAFT

PIC PIC PIC

Figure 6.3: Curves for d3 = 0, d3 = 1 and d3 = -1

.


4.
α12 > 0. Then, ab - h2 = det(A) = α1α2 > 0 and Equation (6.3.6) reduces to
λ1(u + d1)2 + λ2(v + d2)2 = d3, for some d1,d2,d3 ∈ ℝ.
(6.3.8)

We consider the following three subcases to understand this.

(a)
If d3 = 0 then we get a pair of orthogonal lines u + d1 = 0 and v + d2 = 0.
(b)
If d3 < 0 then the solution set of Equation (6.3.8) is an empty set.
(c)
If d3 > 0 then Equation (6.3.8) reduces to α1(u-+-d1)2
    d3 +α2-(v +-d2)2
     d3 = 1, an ellipse or circle with u + d1 = 0 and v + d2 = 0 as the orthogonal principal axes.

Let H(x) = 6x2 + 9y2 + 4xy be the associated quadratic form for a class of curves. Then, A = [    ]
 6  2

 2  9, α1 = 102 = 5 and √--
 5u = x + 2y,√ --
  5v = 2x-y. Now, let d1 = √ --
  5,d2 = -√ --
  5 to get 2(x + 2y + 1)2 + (2x - y - 1)2 = d3. Now vary d3 to get different curves (see Figure 6.4 drawn using the package “MATHEMATICA”).


PIC   PIC

Figure 6.4: Curves for d3 = 0 and d3 = 5

.


Thus, we have considered all the possible cases and the required result follows. _

PICT PICT DRAFT

Remark 6.3.20. Observe that the condition [ ]
 x
 y = [u1u2][  ]
 u
 v implies that the principal axes of the conic are functions of the eigenvectors u1 and u2.

Exercise 6.3.21. Sketch the graph of the following surfaces:

1.
x2 + 2xy + y2 - 6x - 10y = 3.
2.
2x2 + 6xy + 3y2 - 12x - 6y = 5.
3.
4x2 - 4xy + 2y2 + 12x - 8y = 10.
4.
2x2 - 6xy + 5y2 - 10x + 4y = 7.

As a last application, we consider a quadratic in 3 variables, namely x,y and z. To do so, let A = ⌊        ⌋
 a  d   e
|⌈d   b f |⌉

 e  f   c, x = ⌊ ⌋
 x
|⌈y|⌉

 z, b = ⌊  ⌋
 l
|⌈m |⌉

 n and y = ⌊   ⌋
  y1
|⌈ y2|⌉

  y3 with

f(x,y,z)  =   xTAx + 2bT x + q                                          (6.3.9)
                2    2     2
          =   ax + by  + cz + 2dxy + 2exz + 2fyz + 2lx+ 2my  + 2nz + q (6.3.10)
Then, we observe the following: PICT PICT DRAFT
1.
As A is symmetric, PT AP = diag(α123), where P = [u1,u2,u3] is an orthogonal matrix and (αi,ui), for i = 1,2,3 are eigen-pairs of A.
2.
Let y = PT x. Then, f(x,y,z) reduces to
g(y1,y2,y3) = α1y21 + α2y22 + α3y23 + 2l1y1 + 2l2y2 + 2l3y3 + q.
(6.3.11)

3.
Depending on the values of αi’s, rewrite g(y1,y2,y3) to determine the center and the planes of symmetry of f(x,y,z) = 0.

Example 6.3.22. Determine the following quadrics f(x,y,z) = 0, where

1.
f(x,y,z) = 2x2 + 2y2 + 2z2 + 2xy + 2xz + 2yz + 4x + 2y + 4z + 2.
2.
f(x,y,z) = 3x2 - y2 + z2 + 10.
3.
f(x,y,z) = 3x2 - y2 + z2 - 10.
4.
f(x,y,z) = 3x2 - y2 + z - 10.
PICT PICT DRAFT

Solution: Part 1 Here, A = ⌊       ⌋
 2  1  1
|⌈1  2  1|⌉

 1  1  2, b = ⌊ ⌋
 2
|⌈1|⌉

 2 and q = 2. So, the orthogonal matrices P = ⌊            ⌋
  1√-- √1-  1√--
|| 1√3- -√21-  1√6||
⌈  3   2    6⌉
  1√--  0   -√2-
   3        6 and PT AP = ⌊       ⌋
 4  0  0
|       |
⌈0  1  0⌉
 0  0  1. Hence, f(x,y,z) = 0 reduces to

        5            1            1       9
4(y1 + -√--)2 + (y2 + √-)2 + (y3 - √-)2 = --.
       4 3            2            6     12
So, the standard form of the quadric is 4z12 + z22 + z32 = 912-, where ⌊  ⌋
|x |
⌈y ⌉
  z = P⌊    ⌋
 --√5-
|4-13|
|⌈ √2-|⌉
  1√--
   6 = ⌊ -3⌋
| 4 |
⌈ 14 ⌉
  -3-
  4 is the center and x + y + z = 0,x - y = 0 and x + y - 2z = 0 as the principal axes.

Part 2 Here f(x,y,z) = 0 reduces to 2
y10- -  2
3x10- - 2
z10 = 1 which is the equation of a hyperboloid consisting of two sheets with center 0 and the axes x, y and z as the principal axes.

Part 3 Here f(x,y,z) = 0 reduces to 3x2-
10 -y2
10 + z2
10 = 1 which is the equation of a hyperboloid consisting of one sheet with center 0 and the axes x, y and z as the principal axes.

Part 4 Here f(x,y,z) = 0 reduces to z = y2 - 3x2 + 10 which is the equation of a hyperbolic paraboloid.

The different curves are given in Figure 6.5. These curves have been drawn using the package “MATHEMATICA”.


PIC PIC PIC PICT PICT DRAFT PIC

Figure 6.5: Ellipsoid, Hyperboloid of two sheets and one sheet, Hyperbolic Paraboloid

.


PICT PICT DRAFT PICT PICT DRAFT PICT PICT DRAFT