9 Appendix

Chapter 9
Appendix

9.1 Uniqueness of RREF

Definition 9.1.1. Fix n ∈ ℕ. Then, for each f ∈_n, we associate an n × n matrix, denoted P^f = [p_ij], such that p_ij = 1, whenver f(j) = i and 0, otherwise. The matrix P^f is called the Permutation matrix corresponding to the permutation f. For example, I₂, corresponding to Id₂, and [ ]
0 1

1 0 = E₁₂, corresponding to the permutation (1,2), are the two permutation matrices of order 2 × 2.

Remark 9.1.2. Recall that in Remark 9.2.16.1, it was observed that each permutation is a product of n transpositions, (1,2),…,(1,n).

1.

Verify that the elementary matrix E_ij is the permutation matrix corresponding to the transposition (i,j) .

2.

Thus, every permutation matrix is a product of elementary matrices E_1j, 1 ≤ j ≤ n.

3.

For n = 3, the permutation matrices are I₃, ⌊1 0 0⌋
| |
⌈0 0 1⌉
0 1 0

= E₂₃ = E₁₂E₁₃E₁₂, ⌊0 1 0⌋
| |
⌈1 0 0⌉
0 0 1

= E₁₂,

= E₁₂E₁₃,

= E₁₃E₁₂ and ⌊0 0 1⌋
| |
⌈0 1 0⌉
1 0 0

= E₁₃.

4.

Let f ∈

_n and P^f = [p_ij] be the corresponding permutation matrix. Since p_ij = δ_i,j and {f(1),…,f(n)} = [n], each entry of P^f is either 0 or 1. Furthermore, every row and column of P^f has exactly one nonzero entry. This nonzero entry is a 1 and appears at the position p_i,f(i).

5.

By the previous paragraph, we see that when a permutation matrix is multiplied to A PICT

DRAFT

(a): from left then it permutes the rows of A.
(b): from right then it permutes the columns of A.

6.

P is a permutation matrix if and only if P has exactly one 1 in each row and column.
Solution: If P has exactly one 1 in each row and column, then P is a square matrix, say n×n. Now, apply GJE to P. The occurrence of exactly one 1 in each row and column implies that these 1’s are the pivots in each column. We just need to interchange rows to get it in RREF. So, we need to multiply by E_ij. Thus, GJE of P is I_n and P is indeed a product of E_ij’s. The other part has already been explained earlier.

We are now ready to prove Theorem 2.2.17.

Theorem 9.1.3. Let A and B be two matrices in RREF. If they are row equivalent then A = B.

Proof. Note that the matrix A = 0 if and only if B = 0. So, let us assume that the matrices A,B≠0. Also, the row-equivalence of A and B implies that there exists an invertible matrix C such that A = CB, where C is product of elementary matrices.

Since B is in RREF, either B[:,1] = 0^T or B[:,1] = (1,0,…,0)^T. If B[:,1] = 0^T then A[:,1] = CB[:,1] = C0 = 0. If B[:,1] = (1,0,…,0)^T then A[:,1] = CB[:,1] = C[:,1]. As C is invertible, the first column of C cannot be the zero vector. So, A[:,1] cannot be the zero vector. Further, A is in RREF implies that A[:,1] = (1,0,…,0)^T. So, we have shown that if A and B are row-equivalent then their first columns must be the same.

Now, let us assume that the first k - 1 columns of A and B are equal and it contains r pivotal columns. We will now show that the k-th column is also the same.

Define A_k = [A[:,1],…,A[:,k]] and B_k = [B[:,1],…,B[:,k]]. Then, our assumption implies that A[:,i] = B[:,i], for 1 ≤ i ≤ k - 1. Since, the first k - 1 columns contain r pivotal columns, there exists a permutation matrix P such that

DRAFT

If the k-th columns of A and B are pivotal columns then by definition of RREF, A[:,k] = [ ]
0

e1 = B[:,k], where 0 is a vector of size r and e₁ = (1,0,…,0)^T. So, we need to consider two cases depending on whether both are non-pivotal or one is pivotal and the other is not.

As A = CB, we get A_k = CB_k and

[ | ] [ | ][ | ] [ | ] Ir W |A[:,k] C1 |C2 Ir W |B [:,k] C1 C1W |CB [:,k] | = AkP = CBkP = | | = | . 0 0 | C3 |C4 0 0 | C3 C3W |

So, we see that C₁ = I_r, C₃ = 0 and A[:,k] = [ | ]
Ir |C2
0 |C
4

B[:,k].

Case 1: Neither A[:,k] nor B[:,k] are pivotal. Then

[ ] [ | ] [ | ] [ ] [ ] X = A[:,k] = Ir |C2 B[:,k ] = Ir |C2 Y = Y . 0 0 |C4 0 |C4 0 0

Thus, X = Y and in this case the k-th columns are equal.

Case 2: A[:,k] is pivotal but B[:,k] in non-pivotal. Then

[ ] [ | ] [ | ] [ ] [ ] 0 Ir |C2 Ir |C2 Y Y e1 = A [:,k] = 0 |C4 B[:,k] = 0 |C4 0 = 0 ,

a contradiction as e₁≠0. Thus, this case cannot arise.

Therefore, combining both the cases, we get the required result. _

9.2 Permutation/Symmetric Groups

Definition 9.2.1. For a positive integer n, denote [n] = {1,2,…,n}. A function f : A → B is called

1.: one-one/injective if f(x) = f(y) for some x,y ∈ A necessarily implies that x = y.
2.: onto/surjective if for each b ∈ B there exists a ∈ A such that f(a) = b.
3.: a bijection if f is both one-one and onto.

Example 9.2.2. Let A = {1,2,3}, B = {a,b,c,d} and C = {α,β,γ}. Then, the function

1.: j : A → B defined by j(1) = a,j(2) = c and j(3) = c is neither one-one nor onto.
2.: f : A → B defined by f(1) = a,f(2) = c and f(3) = d is one-one but not onto.
3.: g : B → C defined by g(a) = α,g(b) = β,g(c) = α and g(d) = γ is onto but not one-one.
4.: h : B → A defined by h(a) = 2,h(b) = 2,h(c) = 3 and h(d) = 1 is onto.
5.: h ∘ f : A → A is a bijection. DRAFT
6.: g ∘ f : A → C is neither one-one not onto.

Remark 9.2.3. Let f : A → B and g : B → C be functions. Then, the composition of functions, denoted g ∘ f, is a function from A to C defined by (g ∘ f)(a) = g(f(a)). Also, if

1.: f and g are one-one then g ∘ f is one-one.
2.: f and g are onto then g ∘ f is onto.

Thus, if f and g are bijections then so is g ∘ f.

Definition 9.2.4. A function f : [n] → [n] is called a permutation on n elements if f is a bijection. For example, f,g : [2] → [2] defined by f(1) = 1,f(2) = 2 and g(1) = 2,g(2) = 1 are permutations.

Exercise 9.2.5. Let ₃ be the set consisting of all permutation on 3 elements. Then, prove that ₃ has 6 elements. Moreover, they are one of the 6 functions given below.

1.: f₁(1) = 1,f₁(2) = 2 and f₁(3) = 3.
2.: f₂(1) = 1,f₂(2) = 3 and f₂(3) = 2.
3.: f₃(1) = 2,f₃(2) = 1 and f₃(3) = 3. DRAFT
4.: f₄(1) = 2,f₄(2) = 3 and f₄(3) = 1.
5.: f₅(1) = 3,f₅(2) = 1 and f₅(3) = 2.
6.: f₆(1) = 3,f₆(2) = 2 and f₆(3) = 1.

Remark 9.2.6. Let f : [n] → [n] be a bijection. Then, the inverse of f, denote f^-1, is defined by f^-1(m) = ℓ whenever f(ℓ) = m for m ∈ [n] is well defined and f^-1 is a bijection. For example, in Exercise 9.2.5, note that f_i^-1 = f_i, for i = 1,2,3,6 and f₄^-1 = f₅.

Remark 9.2.7. Let _n = {f : [n] → [n] : σ is a permutation}. Then, _n has n! elements and forms a group with respect to composition of functions, called product, due to the following.

1.

Let f ∈ S_n. Then,

(a): f can be written as f = , called a two row notation.
(b): f is one-one. Hence, {f(1),f(2),…,f(n)} = [n] and thus, f(1) ∈ [n],f(2) ∈ [n] \{f(1)},… and finally f(n) = [n] \{f(1),…,f(n - 1)}. Therefore, there are n choices for f(1), n- 1 choices for f(2) and so on. Hence, the number of elements in _n equals n(n - 1)2 ⋅ 1 = n!.

2.

By Remark 9.2.3, f ∘ g ∈

_n, for any f,g ∈ S_n.

3.

Also associativity holds as f ∘ (g ∘ h) = (f ∘ g) ∘ h for all functions f,g and h.

4.

_n has a special permutation called the identity permutation, denoted Id_n, such that Id_n(i) = i, for 1 ≤ i ≤ n. PICT

DRAFT

5.

For each f ∈

_n, f^-1 ∈

_n and is called the inverse of f as f ∘ f^-1 = f^-1 ∘ f = Id_n.

Lemma 9.2.8. Fix a positive integer n. Then, the group _n satisfies the following:

1.: Fix an element f ∈_n. Then, _n = {f ∘ g : g ∈_n} = {g ∘ f : g ∈_n}.
2.: _n = {g^-1 : g ∈_n}.

Proof. Part 1: Note that for each α ∈_n the functions f^-1∘α,α∘f^-1 ∈_n and α = f∘(f^-1∘α) as well as α = (α ∘ f^-1) ∘ f.

Part 2: Note that for each f ∈_n, by definition, (f^-1)^-1 = f. Hence the result holds. __

Definition 9.2.9. Let f ∈_n. Then, the number of inversions of f, denoted n(f), equals

n (f ) = |{(i,j) : i < j,f(i) > f(j)}| = |{j : i + 1 ≤ j ≤ n,f (j) < f(i)}| using two row notation. (9.2.1)

Example 9.2.10.

1.: For f = , n(f) = |{(1,2),(1,3),(2,3)}| = 3.
2.: In Exercise 9.2.5, n(f₅) = 2 + 0 = 2.
3.: Let f = . Then, n(f) = 3+1+1+1+0+3+2+1 = 12.

Definition 9.2.11. [Cycle Notation] Let f ∈ _n. Suppose there exist r,2 ≤ r ≤ n and i₁,…,i_r ∈ [n] such that f(i₁) = i₂,f(i₂) = i₃,…,f(i_r) = i₁ and f(j) = j for all j≠i₁,…,i_r. Then, we represent such a permutation by f = (i₁,i₂,…,i_r) and call it an r-cycle. For example, f = ( )
1 2 3 4 5

4 2 3 5 1 = (1,4,5) and ( )
1 2 3 4 5

1 3 2 4 5 = (2,3).

Remark 9.2.12.

1.: One also write the r-cycle (i₁,i₂,…,i_r) as (i₂,i₃,…,i_r,i₁) and so on. For example, (1,4,5) = (4,5,1) = (5,1,4). DRAFT
2.: The permutation f = is not a cycle.
3.: Let f = (1,3,5,4) and g = (2,4,1) be two cycles. Then, their product, denoted f ∘ g or (1,3,5,4)(2,4,1) equals (1,2)(3,5,4). The calculation proceeds as (the arrows indicate the images):
1 → 2. Note (f ∘ g)(1) = f(g(1)) = f(2) = 2.
2 → 4 → 1 as (f ∘ g)(2) = f(g(2)) = f(4) = 1. So, (1,2) forms a cycle.
3 → 5 as (f ∘ g)(3) = f(g(3)) = f(3) = 5.
5 → 4 as (f ∘ g)(5) = f(g(5)) = f(5) = 4.
4 → 1 → 3 as (f ∘ g)(4) = f(g(4)) = f(1) = 3. So, the other cycle is (3,5,4).
4.: Let f = (1,4,5) and g = (2,4,1) be two permutations. Then, (1,4,5)(2,4,1) = (1,2,5)(4) = (1,2,5) as 1 → 2,2 → 4 → 5,5 → 1,4 → 1 → 4 and
(2,4,1)(1,4,5) = (1)(2,4,5) = (2,4,5) as 1 → 4 → 1,2 → 4,4 → 5,5 → 1 → 2.
5.: Even though is not a cycle, verify that it is a product of the cycles (1,4,5) and (2,3).

Definition 9.2.13. A permutation f ∈_n is called a transposition if there exist m,r ∈ [n] such that f = (m,r).

Remark 9.2.14. Verify that

1.: (2,4,5) = (2,5)(2,4) = (4,2)(4,5) = (5,4)(5,2) = (1,2)(1,5)(1,4)(1,2).
2.: in general, the r-cycle (i₁,…,i_r) = (1,i₁)(1,i_r)(1,i_r-1)(1,i₂)(1,i₁).
3.: So, every r-cycle can be written as product of transpositions. Furthermore, they can be written using the n transpositions (1,2),(1,3),…,(1,n).

DRAFT

With the above definitions, we state and prove two important results.

Theorem 9.2.15. Let f ∈_n. Then, f can be written as product of transpositions.

Proof. Note that using use Remark 9.2.14, we just need to show that f can be written as product of disjoint cycles.

Consider the set S = {1,f(1),f⁽²⁾(1) = (f ∘ f)(1),f⁽³⁾(1) = (f ∘ (f ∘ f))(1),…}. As S is an infinite set and each f⁽ⁱ⁾(1) ∈ [n], there exist i,j with 0 ≤ i < j ≤ n such that f⁽ⁱ⁾(1) = f^(j)(1). Now, let j₁ be the least positive integer such that f⁽ⁱ⁾(1) = f^(j₁)(1), for some i,0 ≤ i < j₁. Then, we claim that i = 0.

For if, i - 1 ≥ 0 then j₁ - 1 ≥ 1 and the condition that f is one-one gives

(i-1) - 1 (i) -1 ( (i) ) -1( (j1) ) -1 (j1) (j1- 1) f (1) = (f ∘f )(1) = f f (1) = f f (1) = (f ∘ f )(1) = f (1).

Thus, we see that the repetition has occurred at the (j₁ - 1)-th instant, contradicting the assumption that j₁ was the least such positive integer. Hence, we conclude that i = 0. Thus, (1,f(1),f⁽²⁾(1),…,f^(j₁-1)(1)) is one of the cycles in f.

Now, choose i₁ ∈ [n] \{1,f(1),f⁽²⁾(1),…,f^(j₁-1)(1)} and proceed as above to get another cycle. Let the new cycle by (i₁,f(i₁),…,f^(j₂-1)(i₁)). Then, using f is one-one follows that

DRAFT

So, the above process needs to be repeated at most n times to get all the disjoint cycles. Thus, the required result follows. __

Remark 9.2.16. Note that when one writes a permutation as product of disjoint cycles, cycles of length 1 are suppressed so as to match Definition 9.2.11. For example, the algorithm in the proof of Theorem 9.2.15 implies

1.: Using Remark 9.2.14.3, we see that every permutation can be written as product of the n transpositions (1,2),(1,3),…,(1,n).
2.: = (1)(2,4,5)(3) = (2,4,5).
3.: = (1,4,5)(2)(3)(6,9)(7,8) = (1,4,5)(6,9)(7,8).

Note that Id₃ = (1,2)(1,2) = (1,2)(2,3)(1,2)(1,3), as well. The question arises, is it possible to write Id_n as a product of odd number of transpositions? The next lemma answers this question in negative.

Lemma 9.2.17. Suppose there exist transpositions f_i,1 ≤ i ≤ t, such that

Idn = f1 ∘ f2 ∘ ⋅⋅⋅∘ ft,

then t is even. PICT

DRAFT

Proof. We will prove the result by mathematical induction. Observe that t≠1 as Id_n is not a transposition. Hence, t ≥ 2. If t = 2, we are done. So, let us assume that the result holds for all expressions in which the number of transpositions t ≤ k. Now, let t = k + 1.

Suppose f₁ = (m,r) and let ℓ,s ∈ [n] \ {m,r}. Then, the possible choices for the composition f₁ ∘ f₂ are (m,r)(m,r) = Id_n,(m,r)(m,ℓ) = (r,ℓ)(r,m),(m,r)(r,ℓ) = (ℓ,r)(ℓ,m) and (m,r)(ℓ,s) = (ℓ,s)(m,r). In the first case, f₁ and f₂ can be removed to obtain Id_n = f₃ ∘f₄ ∘ ⋅⋅⋅ ∘f_t, where the number of transpositions is t- 2 = k - 1 < k. So, by mathematical induction, t - 2 is even and hence t is also even.

In the remaining cases, the expression for f₁ ∘ f₂ is replaced by their counterparts to obtain another expression for Id_n. But in the new expression for Id_n, m doesn’t appear in the first transposition, but appears in the second transposition. The shifting of m to the right can continue till the number of transpositions reduces by 2 (which in turn gives the result by mathematical induction). For if, the shifting of m to the right doesn’t reduce the number of transpositions then m will get shifted to the right and will appear only in the right most transposition. Then, this expression for Id_n does not fix m whereas Id_n(m) = m. So, the later case leads us to a contradiction. Hence, the shifting of m to the right will surely lead to an expression in which the number of transpositions at some stage is t - 2 = k - 1. At this stage, one applies mathematical induction to get the required result. __

Theorem 9.2.18. Let f ∈_n. If there exist transpositions g₁,…,g_k and h₁,…,h_ℓ with

f = g1 ∘g2 ∘⋅⋅⋅∘gk = h1 ∘h2 ∘⋅⋅⋅∘ hℓ

then, either k and ℓ are both even or both odd. PICT

DRAFT

Proof. As g₁ ∘ ⋅⋅⋅ ∘ g_k = h₁ ∘∘ h_ℓ and h^-1 = h for any transposition h ∈_n, we have

Idn = g1 ∘g2 ∘⋅⋅⋅∘ gk ∘ hℓ ∘h ℓ- 1 ∘⋅⋅⋅∘ h1.

Hence by Lemma 9.2.17, k + ℓ is even. Thus, either k and ℓ are both even or both odd. __

Definition 9.2.19. [Even and Odd Permutation] A permutation f ∈_n is called an

1.: even permutation if f can be written as product of even number of transpositions.
2.: odd permutation if f can be written as a product of odd number of transpositions.

Definition 9.2.20. Observe that if f and g are both even or both odd permutations, then f ∘g and g ∘f are both even. Whereas, if one of them is odd and the other even then f ∘g and g ∘ f are both odd. We use this to define a function sgn : _n →{1,-1}, called the signature of a permutation, by PICT PICT DRAFT

{ sgn(f) = 1 if f is an even permutation . - 1 if f is an odd permutation

Example 9.2.21. Consider the set _n. Then,

1.: by Lemma 9.2.17, Id_n is an even permutation and sgn(Id_n) = 1.
2.: a transposition, say f, is an odd permutation and hence sgn(f) = -1
3.: using Remark 9.2.20, sgn(f ∘ g) = sgn(f) ⋅ sgn(g) for any two permutations f,g ∈_n.

We are now ready to define determinant of a square matrix A.

Definition 9.2.22. Let A = [a_ij] be an n×n matrix with complex entries. Then, the determinant of A, denoted det(A), is defined as

∑ ∑ ∏n det(A) = sgn(g)a1g(1)a2g(2)...ang(n) = sgn(g) aig(i). g∈Sn σ∈Sn i=1

(9.2.2)

DRAFT

For example, if ₂ = {Id,f = (1,2)} then for A = [ ]
1 2
2 1 , det(A) = sgn(Id) ⋅ a_1Id(1)a_2Id(2) + sgn(f) ⋅ a_1f(1)a_2f(2) = 1 ⋅ a₁₁a₂₂ + (-1)a₁₂a₂₁ = 1 - 4 = -3.

Observe that det(A) is a scalar quantity. Even though the expression for det(A) seems complicated at first glance, it is very helpful in proving the results related with “properties of determinant”. We will do so in the next section. As another examples, we verify that this definition also matches for 3 × 3 matrices. So, let A = [a_ij] be a 3 × 3 matrix. Then, using Equation (9.2.2),

3 ∑ ∏ det(A) = sgn(σ) aiσ(i) σ∈Sn i=1 ∏3 ∏3 ∏3 = sgn(f1) aif1(i) + sgn(f2) aif2(i) + sgn(f3) aif3(i) + i=1 i=1 i=1 3∏ ∏3 ∏3 sgn (f4) aif4(i) + sgn(f5) aif5(i) + sgn(f6) aif6(i) i=1 i=1 i=1 = a11a22a33 - a11a23a32 - a12a21a33 + a12a23a31 + a13a21a32 - a13a22a31.

9.3 Properties of Determinant

Theorem 9.3.1 (Properties of Determinant). Let A = [a_ij] be an n × n matrix.

1.: If A[i,:] = 0^T for some i then det(A) = 0.
2.: If B = E_i(c)A, for some c≠0 and i ∈ [n] then det(B) = cdet(A). DRAFT
3.: If B = E_ijA, for some i≠j then det(B) = -det(A).
4.: If A[i,:] = A[j,:] for some i≠j then det(A) = 0.
5.: Let B and C be two n×n matrices. If there exists m ∈ [n] such that B[i,:] = C[i,:] = A[i,:] for all i≠m and C[m,:] = A[m,:] + B[m,:] then det(C) = det(A) + det(B).
6.: If B = E_ij(c), for c≠0 then det(B) = det(A).
7.: If A is a triangular matrix then det(A) = a₁₁a_nn, the product of the diagonal entries.
8.: If E is an n × n elementary matrix then det(EA) = det(E)det(A).
9.: A is invertible if and only if det(A)≠0.
10.: If B is an n × n matrix then det(AB) = det(A)det(B).
11.: If A^T denotes the transpose of the matrix A then det(A) = det(A^T).

Proof. Part 1: Note that each sum in det(A) contains one entry from each row. So, each sum has an entry from A[i,:] = 0^T. Hence, each sum in itself is zero. Thus, det(A) = 0.

Part 2: By assumption, B[k,:] = A[k,:] for k≠i and B[i,:] = cA[i,:]. So,

( ) ( ) ∑ ∏ ∑ ∏ det(B ) = sgn(σ)( bkσ(k)) biσ(i) = sgn(σ)( akσ(k)) caiσ(i) σ∈Sn k⁄=i σ∈Sn k⁄=i ∑ ∏n = c sgn(σ) akσ(k) = cdet(A). σ∈Sn k=1

Part 3: Let τ = (i,j). Then, sgn(τ) = -1, by Lemma 9.2.8, _n = {σ ∘ τ : σ ∈_n} and

∑ ∏n ∑ n∏ det(B ) = sgn(σ) biσ(i) = sgn(σ ∘τ) bi,(σ∘τ)(i) σ∈Sn i=1 (σ∘τ∈Sn ) i=1 ∑ ∏ = sgn(τ) ⋅sgn(σ)( bkσ(k)) bi(σ∘τ)(i)bj(σ∘τ)(j) σ∘τ∈Sn k⁄=i,j ( ) ∑ ∏ ∑ ∏n = sgn(τ) sgn(σ)( bkσ(k)) biσ(j)bjσ(i) = - sgn (σ ) akσ(k) σ∈Sn k⁄=i,j σ∈Sn k=1 = - det(A ).

Part 4: As A[i,:] = A[j,:], A = E_ijA. Hence, by Part 3, det(A) = -det(A). Thus, det(A) = 0.

Part 5: By assumption, C[i,:] = B[i,:] = A[i,:] for i≠m and C[m,:] = B[m,:] + A[m,:]. So,

( ) ∑ ∏n ∑ ∏ det(C ) = sgn(σ) ciσ(i) = sgn(σ )( ciσ(i)) cmσ(m) σ∈Sn i=1 σ∈Sn i⁄=m ( ) = ∑ sgn(σ)( ∏ c ) (a + b ) iσ(i) mσ(m) m σ(m ) σ∈Sn i⁄=m ∑ ∏n ∑ ∏n = sgn(σ) aiσ(i) + sgn(σ ) biσ(i) = det(A)+ det(B ). σ∈Sn i=1 σ∈Sn i=1

Part 6: By assumption, B[k,:] = A[k,:] for k≠i and B[i,:] = A[i,:] + cA[j,:]. So,

n ( ) det(B ) = ∑ sgn(σ) ∏ b = ∑ sgn(σ)( ∏ b ) b kσ(k) kσ(k) iσ(i) σ∈Sn k=(1 )σ∈Sn k⁄=i ∑ ∏ = sgn(σ) ( akσ(k)) (aiσ(i) + cajσ(j)) σ∈Sn k⁄=i ( ) ( ) ∑ (∏ ) ∑ ( ∏ ) = sgn(σ) akσ(k) aiσ(i) + c sgn(σ ) akσ(k) ajσ(j)) σ∈Sn k⁄=i σ∈Sn k⁄=i ∑ n∏ = sgn(σ) akσ(k) + c⋅0 = det(A). U sePart 4 σ∈Sn k=1

Part 7: Observe that if σ ∈_n and σ≠Id_n then n(σ) ≥ 1. Thus, for every σ≠Id_n, there exists m ∈ [n] (depending on σ) such that m > σ(m) or m < σ(m). So, if A is triangular, a_mσ(m) = 0. So, for each σ≠Id_n, ∏ _i=1ⁿa_iσ(i) = 0. Hence, det(A) = ∏ _i=1ⁿa_ii. the result follows.

Part 8: Using Part 7, det(I_n) = 1. By definition E_ij = E_ijI_n and E_i(c) = E_i(c)I_n and E_ij(c) = E_ij(c)I_n, for c≠0. Thus, using Parts 2, 3 and 6, we get det(E_i(c)) = c,det(E_ij) = -1 and det(E_ij(k)) = 1. Also, again using Parts 2, 3 and 6, we get det(EA) = det(E)det(A).

Part 9: Suppose A is invertible. Then, by Theorem 2.3.1, A = E₁ ⋅⋅⋅ E_k, for some elementary matrices E₁,…,E_k. So, a repeated application of Part 8 implies det(A) = det(E₁)det(E_k)≠0 as det(E_i)≠0 for 1 ≤ i ≤ k. PICT PICT DRAFT

Now, suppose that det(A)≠0. We need to show that A is invertible. On the contrary, assume that A is not invertible. Then, by Theorem 2.3.1, Rank(A) < n. So, by Proposition 2.2.21, there exist elementary matrices E₁,…,E_k such that E₁ ⋅⋅⋅ E_kA = [ ]
B
0 . Therefore, by Part 1 and a repeated application of Part 8 gives

( [ ]) B det(E1)⋅⋅⋅det(Ek)det(A ) = det(E1 ⋅⋅⋅EkA ) = det = 0. 0

As det(E_i)≠0, for 1 ≤ i ≤ k, we have det(A) = 0, a contradiction. Thus, A is invertible.

Part 10: Let A be invertible. Then, by Theorem 2.3.1, A = E₁ ⋅⋅⋅ E_k, for some elementary matrices E₁,…,E_k. So, applying Part 8 repeatedly gives det(A) = det(E₁)det(E_k) and

det(AB ) = det(E1⋅⋅⋅EkB ) = det(E1)⋅⋅⋅det(Ek)det(B ) = det(A )det(B).

In case A is not invertible, by Part 9, det(A) = 0. Also, AB is not invertible (AB is invertible will imply A is invertible using the rank argument). So, again by Part 9, det(AB) = 0. Thus, det(AB) = det(A)det(B).

Part 11: Let B = [b_ij] = A^T. Then, b_ij = a_ji, for 1 ≤ i,j ≤ n. By Lemma 9.2.8, we know that _n = {σ^-1 : σ ∈_n}. As σ ∘ σ^-1 = Id_n, sgn(σ) = sgn(σ^-1). Hence,

n n n det(B) = ∑ sgn(σ)∏ b = ∑ sgn(σ)∏ a = ∑ sgn(σ-1)∏ a -1 iσ(i) σ(i),i -1 iσ (i) σ∈Sn i=1 σ∈Sn i=1 σ ∈Sn i=1 = det(A ).

Remark 9.3.2.

1.: As det(A) = det(A^T), we observe that in Theorem 9.3.1, the condition on “row” can be replaced by the condition on “column”.
2.: Let A = [a_ij] be a matrix satisfying a_1j = 0, for 2 ≤ j ≤ n. Let B = A(1|1), the submatrix of A obtained by removing the first row and the first column. Then det(A) = a₁₁ det(B).
Proof: Let σ ∈_n with σ(1) = 1. Then, σ has a cycle (1). So, a disjoint cycle representation of σ only has numbers {2,3,…,n}. That is, we can think of σ as an element of _n-1. Hence,
$∑ ∏n ∑ ∏n det(A ) = sgn (σ ) aiσ(i) = sgn(σ) aiσ(i) σ∈Sn i=1 σ∈S ,σ(1)=1 i=1 n ∑ ∏n ∑ n-∏ 1 = a11 sgn(σ) aiσ(i) = a11 sgn(σ) biσ(i) = a11det(B). σ∈Sn,σ(1)=1 i=2 σ∈Sn-1 i=1$

DRAFT

We now relate this definition of determinant with the one given in Definition 2.3.6.

Theorem 9.3.3. Let A be an n × n matrix. Then, det(A) = ∑ _j=1ⁿ(-1)^1+ja_1j detA(1|j), where recall that A(1|j) is the submatrix of A obtained by removing the 1^st row and the j^th column.

Proof. For 1 ≤ j ≤ n, define an n × n matrix B_j = ⌊ ⌋
0 0 ⋅⋅⋅ a1j ⋅⋅⋅ 0
||a21 a22 ⋅⋅⋅ a2j ⋅⋅⋅ a2n||
|| . . . . . ||
⌈ .. .. .. .. .. ⌉
a a ⋅⋅⋅ a ⋅⋅⋅ a
n1 n2 nj nn . Also, for each matrix B_j, we define the n × n matrix C_j by

1.: C_j[:,1] = B_j[:,j],
2.: C_j[:,i] = B_j[:,i - 1], for 2 ≤ i ≤ j and
3.: C_j[:,k] = B_j[:,k] for k ≥ j + 1.

Also, observe that B_j’s have been defined to satisfy B₁[1,:] + ⋅⋅⋅ + B_n[1,:] = A[1,:] and B_j[i,:] = A[i,:] for all i ≥ 2 and 1 ≤ j ≤ n. Thus, by Theorem 9.3.1.5,

∑n det(A) = det(Bj). j=1

(9.3.1)

DRAFT

Let us now compute det(B_j), for 1 ≤ j ≤ n. Note that C_j = E₁₂E₂₃ ⋅⋅⋅ E_j-1,jB_j, for 1 ≤ j ≤ n. Then, by Theorem 9.3.1.3, we get det(B_j) = (-1)^j-1 det(C_j). So, using Remark 9.3.2.2 and Theorem 9.3.1.2 and Equation (9.3.1), we have

∑n ∑n det(A ) = (- 1)j- 1det(Cj ) = (- 1)j+1a1j det(A (1|j)). j=1 j=1

Thus, we have shown that the determinant defined in Definition 2.3.6 is valid. __

9.4 Dimension of W₁ + W₂

Theorem 9.4.1. Let V be a finite dimensional vector space over F and let W₁ and W₂ be two subspaces of V. Then,

dim (W ) + dim(W ) = dim (W + W )+ dim(W ∩W ). 1 2 1 2 1 2

(9.4.1)

PICT PICT DRAFT

Proof. Since W₁ ∩ W₂ is a vector subspace of V , let = {u₁,…,u_r} be a basis of W₁ ∩ W₂. As, W₁ ∩ W₂ is a subspace of both W₁ and W₂, let us extend the basis to form a basis ₁ = {u₁,…,u_r,v₁,…,v_s} of W₁ and a basis ₂ = {u₁,…,u_r,w₁,…,w_t} of W₂.

We now prove that = {u₁,…,u_r,w₁,…,w_s,v₁,v₂,…,v_t} is a basis of W₁ + W₂. To do this, we show that

1.: is linearly independent subset of V and
2.: LS() = W₁ + W₂.

The second part can be easily verified. For the first part, consider the linear system

α1u1 + ⋅⋅⋅+ αrur + β1w1 + ⋅⋅⋅+ βsws + γ1v1 + ⋅⋅⋅+ γtvt = 0

(9.4.2)

in the variables α_i’s, β_j’s and γ_k’s. We re-write the system as

α1u1 + ⋅⋅⋅+ αrur + β1w1 + ⋅⋅⋅+ βsws = - (γ1v1 + ⋅⋅⋅+ γtvt).

Then, v = -∑ _i=1^sγ_iv_i ∈ LS(

₁) = W₁. Also, v = ∑ _j=1^rα_ru_r + ∑ _k=1^Tβ_kw_k. So, v ∈ LS(

₂) = W₂. Hence, v ∈ W₁ ∩ W₂ and therefore, there exists scalars δ₁,…,δ_k such that v = ∑ _j=1^rδ_ju_j. PICT

DRAFT

Substituting this representation of v in Equation (9.4.2), we get

(α1 - δ1)u1 + ⋅⋅⋅+ (αr - δr)ur + β1w1 + ⋅⋅⋅+ βtwt = 0.

So, using Exercise 3.4.16.1, α_i -δ_i = 0, for 1 ≤ i ≤ r and β_j = 0, for 1 ≤ j ≤ t. Thus, the system (9.4.2) reduces to

α1u1 + ⋅⋅⋅+ αkuk + γ1v1 + ⋅⋅⋅+ γrvr = 0

which has α_i = 0 for 1 ≤ i ≤ r and γ_j = 0 for 1 ≤ j ≤ s as the only solution. Hence, we see that the linear system of Equations (9.4.2) has no nonzero solution. Therefore, the set

is linearly independent and the set

is indeed a basis of W₁ + W₂. We now count the vectors in the sets

₁,

₂ and

to get the required result. __

9.5 When does Norm imply Inner Product

In this section, we prove the following result. A generalization of this result to complex vector space is left as an exercise for the reader as it requires similar ideas.

Theorem 9.5.1. Let V be a real vector space. A norm ∥⋅∥ is induced by an inner product if and only if, for all x,y ∈ V, the norm satisfies

DRAFT

∥x + y∥2 + ∥x - y ∥2 = 2∥x ∥2 + 2∥y ∥2 (parallelogram law ).

(9.5.1)

Proof. Suppose that ∥⋅∥ is indeed induced by an inner product. Then, by Exercise 5.1.7.3 the result follows.

So, let us assume that ∥⋅∥ satisfies the parallelogram law. So, we need to define an inner product. We claim that the function f : V × V → ℝ defined by

f (x, y) = 1(∥x + y∥2 - ∥x - y∥2), for all x,y ∈ V 4

satisfies the required conditions for an inner product. So, let us proceed to do so.

Step 1: Clearly, for each x ∈ V, f(x,0) = 0 and f(x,x) = 1-
4

∥x + x∥² = ∥x∥². Thus, f(x,x) ≥ 0. Further, f(x,x) = 0 if and only if x = 0.

Step 2: By definition f(x,y) = f(y,x) for all x,y ∈ V.

Step 3: Now note that ∥x + y∥² -∥x - y∥² = 2 ( )
∥x + y∥2 - ∥x∥2 - ∥y∥2

. Or equivalently,

2f(x,y) = ∥x + y∥2 - ∥x∥2 - ∥y ∥2, for x, y ∈ V.

(9.5.2)

Thus, for x,y,z ∈ V, we have

4(f(x,y) + f(z,y)) = ∥x + y∥2 - ∥x- y∥2 + ∥z + y∥2 - ∥z- y ∥2 ( 2 2 2 2 2) = 2 ∥x + y ∥ + ∥z + y∥ - ∥x ∥( - ∥z∥ - 2∥y∥ ) = ∥x + z+ 2y∥2 + ∥x - z∥2 - ∥x+ z∥2 + ∥x - z∥2 - 4∥y∥2 2 2 2 = ∥x + z+ 2y∥ - ∥x + z∥ - ∥2y∥ = 2f(x + z,2y) using Equation (9.5.2). (9.5.3 )

Now, substituting z = 0 in Equation (9.5.3) and using Equation (9.5.2), we get 2f(x,y) = f(x,2y) and hence 4f(x + z,y) = 2f(x + z,2y) = 4 (f(x,y )+ f(z,y))

. Thus,

f (x + z,y ) = f(x,y )+ f(z,y), for all x,y ∈ V.

(9.5.4)

Step 4: Using Equation (9.5.4), f(x,y) = f(y,x) and the principle of mathematical induction, it follows that nf(x,y) = f(nx,y), for all x,y ∈ V and n ∈ ℕ. Another application of Equation (9.5.4) with f(0,y) = 0 implies that nf(x,y) = f(nx,y), for all x,y ∈ V and n ∈ ℤ. PICT

DRAFT Also, for m≠0,

( ) mf n-x,y = f(m n-x,y) = f(nx,y ) = nf (x, y). m m

Hence, we see that for all x,y ∈ V and a ∈ ℚ, f (ax,y)

= af(x,y).

Step 5: Fix u,v ∈ V and define a function g : ℝ → ℝ by

g(x) = f(xu, v)- xf (u,v) 1 ( 2 2 2) x ( 2 2 2) = -- ∥xu + v∥ - ∥xu∥ - ∥v∥ - -- ∥u + v∥ - ∥u ∥ - ∥v ∥ . 2 2

Then, by previous step g(x) = 0, for all x ∈ ℚ. So, if g is a continuous function then continuity implies g(x) = 0, for all x ∈ ℝ. Hence, f(xu,v) = xf(u,v), for all x ∈ ℝ.

Note that the second term of g(x) is a constant multiple of x and hence continuous. Using a similar reason, it is enough to show that g₁(x) = ∥xu + v∥, for certain fixed vectors u,v ∈ V, is continuous. To do so, note that

∥x1u + v∥ = ∥(x1 - x2 )u + x2u + v∥ ≤ ∥(x1 - x2)u∥ + ∥x2u + v∥.

Thus,

∥x₁u + v∥-∥x₂u + v∥ |
||

≤∥(x₁ - x₂)u∥. Hence, taking the limit as x₁ → x₂, we get lim_x₁→x₂∥x₁u + v∥ = ∥x₂u + v∥.

DRAFT

Thus, we have proved the continuity of g and hence the prove of the required result. _

9.6 Roots of a Polynomials

The main aim of this subsection is to prove the continuous dependence of the zeros of a polynomial on its coefficients and to recall Descartes’s rule of signs.

Definition 9.6.1. [Jordan Curves] A curve in ℂ is a continuous function f : [a,b] → ℂ, where [a,b] ⊆ ℝ.

1.: If the function f is one-one on [a,b) and also on (a,b], then it is called a simple curve.
2.: If f(b) = f(a), then it is called a closed curve.
3.: A closed simple curve is called a Jordan curve.
4.: The derivative (integral) of a curve f = u+iv is defined component wise. If f′ is continuous on [a,b], we say f is a ¹-curve (at end points we consider one sided derivatives and continuity).
5.: A ¹-curve on [a,b] is called a smooth curve, if f′ is never zero on (a,b).
6.: A piecewise smooth curve is called a contour.
7.: A positively oriented simple closed curve is called a simple closed curve such that while traveling on it the interior of the curve always stays to the left. (Camille Jordan has proved that such a curve always divides the plane into two connected regions, one of which is called the bounded region and the other is called the unbounded region. The one which is bounded is considered as the interior of the curve.)

We state the famous Rouche Theorem of complex analysis without proof.

PICT PICT DRAFT Theorem 9.6.2. [Rouche’s Theorem] Let C be a positively oriented simple closed contour. Also, let f and g be two analytic functions on R_C, the union of the interior of C and the curve C itself. Assume also that |f(x)| > |g(x)|, for all x ∈ C. Then, f and f + g have the same number of zeros in the interior of C.

Corollary 9.6.3. [Alen Alexanderian, The University of Texas at Austin, USA.] Let P(t) = tⁿ + a_n-1t^n-1 + ⋅⋅⋅ + a₀ have distinct roots λ₁,…,λ_m with multiplicities α₁,…,α_m, respectively. Take any ϵ > 0 for which the balls B_ϵ(λ_i) are disjoint. Then, there exists a δ > 0 such that the polynomial q(t) = tⁿ + a_n-1′t^n-1 + + a₀′ has exactly α_i roots (counting with multiplicities) in B_ϵ(λ_i), whenever |a_j - a_j′| < δ.

Proof. For an ϵ > 0 and 1 ≤ i ≤ m, let C_i = {z ∈ ℂ : |z - λ_i| = ϵ}. Now, for each i,1 ≤ i ≤ m, take ν_i = min_{z∈C_i}|p(z)|, ρ_i = max_{z∈C_i}[1 + |z| + ⋅⋅⋅ + |z|^n-1] and choose δ > 0 such that ρ_iδ < ν_i. Then, for a fixed j and z ∈ C_j, we have

|q(z)- P (z)| = |(a ′n-1 - an- 1)zn- 1 + ⋅⋅⋅+ (a′0 - a0)| ≤ δρj < νj ≤ |p(z)|.

Hence, by Rouche’s theorem, p(z) and q(z) have the same number of zeros inside C_j, for each j = 1,…,m. That is, the zeros of q(t) are within the ϵ-neighborhood of the zeros of P(t). _

As a direct application, we obtain the following corollary.

Corollary 9.6.4. Eigenvalues of a matrix are continuous functions of its entries.

Proof. Follows from Corollary 9.6.3. _

DRAFT

Remark 9.6.5.

1.

[Sign changes in a polynomial] Let P(x) = ∑ ₀ⁿa_ix^n-i be a real polynomial, with a₀≠0. Read the coefficient from the left a₀,a₁,…. We say the sign changes of a_i occur at m₁ < m₂ < ⋅⋅⋅

< m_k to mean that a_m₁ is the first after a₀ with sign opposite to a₀; a_m₂ is the first after a_m₁ with sign opposite to a_m₁; and so on.

2.

[Descartes’ Rule of Signs] Let P(x) = ∑ ₀ⁿa_ix^n-i be a real polynomial. Then, the maximum number of positive roots of P(x) = 0 is the number of changes in sign of the coefficients and that the maximum number of negative roots is the number of sign changes in P(-x) = 0.

Proof. Assume that a₀,a₁, ⋅⋅⋅ ,a_n has k > 0 sign changes. Let b > 0. Then, the coefficients of (x - b)P(x) are

a0,a1 - ba0,a2 - ba1,⋅⋅⋅,an - ban-1,- ban.

This list has at least k + 1 changes of signs. To see this, assume that a₀ > 0 and a_n≠0. Let the sign changes of a_i occur at m₁ < m₂ <

< m_k. Then, setting

c0 = a0,c1 = am1 - bam1-1,c2 = am2 - bam2-1,⋅⋅⋅,ck = amk - bamk -1,ck+1 = - ban,

we see that c_i > 0 when i is even and c_i < 0, when i is odd. That proves the claim.

Now, assume that P(x) = 0 has k positive roots b₁,b₂, ⋅⋅⋅ ,b_k. Then, PICT PICT DRAFT

P(x) = (x- b1)(x - b2)⋅⋅⋅(x - bk)Q(x),

where Q(x) is a real polynomial. By the previous observation, the coefficients of (x - b_k)Q(x) has at least one change of signs, coefficients of (x-b_k-1)(x-b_k)Q(x) has at least two, and so on. Thus coefficients of P(x) has at least k change of signs. The rest of the proof is similar. _

9.7 Variational characterizations of Hermitian Matrices

Let A ∈ M_n(ℂ) be a Hermitian matrix. Then, by Theorem 6.2.22, we know that all the eigenvalues of A are real. So, we write λ_i(A) to mean the i-th smallest eigenvalue of A. That is, the i-th from the left in the list λ₁(A) ≤ λ₂(A) ≤ ⋅⋅⋅ ≤ λ_n(A).

Lemma 9.7.1. [Rayleigh-Ritz Ratio] Let A ∈ M_n(ℂ) be a Hermitian matrix. Then,

1.: λ₁(A)x^*x ≤ x^*Ax ≤ λ_n(A)x^*x, for each x ∈ ℂⁿ.
2.: λ₁(A) = min_x≠0 = min_∥x∥=1x^*Ax.
3.: λ_n(A) = max_x≠0 = max_∥x∥=1x^*Ax.

Proof. Proof of Part 1: By spectral theorem (see Theorem 6.2.22, there exists a unitary matrix U such that A = UDU^*, where D = diag(λ₁(A),…,λ_n(A)) is a real diagonal matrix. Thus, the set {U[:,1],…,U[:,n]} is a basis of ℂⁿ. Hence, for each x ∈ ℂⁿ, there exists _i’s (scalar) such that x = ∑ α_iU[:,i]. So, note that x^*x = |α_i|² and PICT PICT DRAFT

* ∑ 2 ∑ 2 * ∑ 2 * λ1(A )x x = λ1(A) |αi| ≤ |αi| λi(A) = x Ax ≤ λn |αi| = λnx x.

For Part 2 and Part 3, take x = U[:,1] and x = U(:,n), respectively. _

As an immediate corollary, we state the following result.

Corollary 9.7.2. Let A ∈ M_n(ℂ) be a Hermitian matrix and α = x*Ax--
x*x . Then, A has an eigenvalue in the interval (-∞,α] and has an eigenvalue in the interval [α,∞).

We now generalize the second and third parts of Theorem 9.7.2.

Proposition 9.7.3. Let A ∈ M_n(ℂ) be a Hermitian matrix with A = UDU^*, where U is a unitary matrix and D is a diagonal matrix consisting of the eigenvalues λ₁ ≤ λ₂ ≤ ⋅⋅⋅ ≤ λ_n. Then, for any positive integer k,1 ≤ k ≤ n,

λk = min x*Ax = max x *Ax. x⊥U[:,∥x1]∥,.=..,1U[:,k-1] x⊥U[:∥,nx],∥..=.,U1[:,k+1]

Proof. Let x ∈ ℂⁿ such that x is orthogonal to U[,1],…,U[:,k - 1]. Then, we can write x = ∑ _i=kⁿα_iU[:,i], for some scalars α_i’s. In that case, PICT PICT DRAFT

∑n ∑n λkx*x = λk |αi|2 ≤ |αi|2λi = x*Ax i=k i=k

and the equality occurs for x = U[:,k]. Thus, the required result follows. _

Theorem 9.7.4. [Courant-Fischer] Let A ∈ M_n(ℂ) be a Hermitian matrix with eigenvalues λ₁ ≤ λ₂ ≤ ⋅⋅⋅ ≤ λ_n. Then,

* * λk = w1,m..a.,xwk-1 m∥ixn∥=1 x Ax = wn,m.i..n,wk+1 m∥xa∥x=1 x Ax. x⊥w1,...,wk-1 x⊥wn,...,wk+1

Proof. Let A = UDU^*, where U is a unitary matrix and D = diag(λ₁,…,λ_n). Now, choose a set of k - 1 linearly independent vectors from ℂⁿ, say S = {w₁,…,w_k-1}. Then, some of the eigenvectors {U[:,1],…,U[:,k - 1]} may be an element of S^⊥. Thus, using Proposition 9.7.3, we see that

λk = min x*Ax ≥ min x*Ax. x⊥U[:∥,1x∥],=...1,U,[:,k-1] ∥xx∈∥=S1⊥

Hence, λ_k ≥ max_{w₁,…,w_k-1} min_{∥x∥=1
x⊥w₁,…,w_k-1}x^*Ax, for each choice of k - 1 linearly independent vectors. But, by Proposition 9.7.3, the equality holds for the linearly independent set {U[:,1],…,U[:,k - 1]} PICT

DRAFT which proves the first equality. A similar argument gives the second equality and hence the proof is omitted. _

Theorem 9.7.5. [Weyl Interlacing Theorem] Let A,B ∈ M_n(ℂ) be a Hermitian matrices. Then, λ_k(A) + λ₁(B) ≤ λ_k(A + B) ≤ λ_k(A) + λ_n(B). In particular, if B = P^*P, for some matrix P, then λ_k(A + B) ≥ λ_k(A). In particular, for z ∈ ℂⁿ, λ_k(A + zz^*) ≤ λ_k+1(A).

Proof. As A and B are Hermitian matrices, the matrix A + B is also Hermitian. Hence, by Courant-Fischer theorem and Lemma 9.7.1.1,

λk(A + B) = max min x*(A + B )x w1,...,wk-1 x⊥w ∥x∥,..=.,1w 1 k-1 * ≤ w1m,.a..x,wk-1 m∥xi∥n=1 [x Ax + λn(B )] = λk(A) + λn(B ) x⊥w1,...,wk-1

and

* λk (A + B) = w1m,.a..,xwk-1 ∥mxi∥n=1 x (A + B )x x⊥w1,...,wk- 1 ≥ max min [x*Ax + λ1(B)] = λk(A)+ λ1 (B ). w1,...,wk-1 x⊥w∥1x,∥..=.,1wk- 1

If B = P^*P, then λ₁(B) = min_∥x∥=1x^*(P^*P)x = min_∥x∥=1∥Px∥² ≥ 0. Thus, PICT PICT DRAFT

λk(A + B ) ≥ λk(A )+ λ1(B ) ≥ λk(A ).

In particular, for z ∈ ℂⁿ, we have

λk(A + zz*) = max min [x*Ax + |x*z|2] w1,...,wk -1x⊥w∥1x∥,.=..1,wk-1 * * 2 ≤ w1m,a...x,wk -1 ∥mxi∥n=1 [x Ax + |x z| ] x⊥w1,...,wk-1,z = max min x*Ax w1,...,wk -1x⊥w∥1x,.∥..=,w1 ,z k-1 * ≤ w1,...m,wakx-1,wk m∥ix∥n=1 x Ax = λk+1 (A ). x⊥w1,...,wk-1,wk

Theorem 9.7.6. [Cauchy Interlacing Theorem] Let A ∈ M_n(ℂ) be a Hermitian matrix. Define Â = [ ]
A y
y * a , for some a ∈ ℝ and y ∈ ℂⁿ. Then,

ˆ ˆ λk(A) ≤ λk(A ) ≤ λk+1(A ).

DRAFT

Proof. Note that

λk+1( ˆA) = max min x*Aˆx ≤ max min x * ˆAx w1,...,wk∈ℂn+1 x⊥∥wx1∥,=..1.,wk w1,...,wk ∈ℂn+1x⊥∥wx1∥=,.1..,wkxn+1=0 = max min x *Ax = λ (A ) w1,...,wk∈ℂn ∥x∥=1 k+1 x⊥w1,...,wk

and

λk+1( ˆA) = min n+1 m∥xa∥x=1 x*Aˆx ≥ min n+1 ∥x∥=m1ax x * ˆAx w1,...,wn-k∈ℂ x⊥w1,...,wn-k w1,...,wn -k∈ ℂ x⊥w1,...,wn-kxn+1=0 = min max x *Ax = λ (A ) w1,...,wn-k∈ℂn x⊥w∥x∥,..=.,1w k 1 n-k

As an immediate corollary, one has the following result.

Corollary 9.7.7. [Inclusion principle] Let A ∈ M_n(ℂ) be a Hermitian matrix and r be a positive integer with 1 ≤ r ≤ n. If B_r×r is a principal submatrix of A then, λ_k(A) ≤ λ_k(B) ≤ λ_k+n-r(A).

PICT PICT DRAFT Theorem 9.7.8. [Poincare Separation Theorem] Let A ∈ M_n(ℂ) be a Hermitian matrix and {u₁,…,u_r}⊆ ℂⁿ be an orthonormal set for some positive integer r,1 ≤ r ≤ n. If further B = [b_ij] is an r × r matrix with b_ij = u_i^*Au_j,1 ≤ i,j ≤ r then, λ_k(A) ≤ λ_k(B) ≤ λ_k+n-r(A).

Proof. Let us extend the orthonormal set {u₁,…,u_r} to an orthonormal basis, say {u₁,…,u_n} of ℂⁿ and write U = [ ]
u1 ⋅⋅⋅ un . Then, B is a r × r principal submatrix of U^*AU. Thus, by inclusion principle, λ_k(U^*AU) ≤ λ_k(B) ≤ λ_k+n-r(U^*AU). But, we know that σ(U^*AU) = σ(A) and hence the required result follows. _

The proof of the next result is left for the reader.

Corollary 9.7.9. Let A ∈ M_n(ℂ) be a Hermitian matrix and r be a positive integer with 1 ≤ r ≤ n. Then,

λ (A) + ⋅⋅⋅+ λ (A) = min trU *AU and λ (A) + ⋅⋅⋅ + λ (A) = max trU *AU. 1 r U*U=Ir n-r+1 n U*U=Ir

Corollary 9.7.10. Let A ∈ M_n(ℂ) be a Hermitian matrix and W be a k-dimensional subspace of ℂⁿ. Suppose, there exists a real number c such that x^*Ax ≥ cx^*x, for each x ∈ W. Then, λ_n-k+1(A) ≥ c. In particular, if x^*Ax > 0, for each nonzero x ∈ W, then λ_n-k+1 > 0. Note that, a k-dimensional subspace need not contain an eigenvector of A. For example, the line y = 2x does not contain an eigenvector of [ ]
1 0

0 2 .

Proof. Let {x₁,…,x_n-k} be a basis of W^⊥. Then, PICT PICT DRAFT

* * λn- k+1(A) = w1m,.a..x,w m∥ix∥n=1 x Ax ≥ m∥ixn∥=1 x Ax ≥ c. n-k x⊥w1,...,wn-k x⊥x1,...,xn-k

Now assume that x^*Ax > 0 holds for each nonzero x ∈ W and that λ_n-k+1 = 0. Then, it follows that min_{∥x∥=1
x⊥x₁,…,x_n-k}x^*Ax = 0. Now, define f : ℂⁿ → ℂ by f(x) = x^*Ax.

Then, f is a continuous function and min_{∥x∥=1
x∈W}f(x) = 0. Thus, f must attain its bound on the unit sphere. That is, there exists y ∈ W with ∥y∥ = 1 such that y^*Ay = 0, a contradiction. Thus, the required result follows. _ PICT PICT DRAFT PICT PICT DRAFT PICT PICT DRAFT

[next] [prev] [prev-tail] [front] [up]

Chapter 9Appendix

9.1 Uniqueness of RREF

9.2 Permutation/Symmetric Groups

9.3 Properties of Determinant

9.4 Dimension of W1 + W2

9.5 When does Norm imply Inner Product

9.6 Roots of a Polynomials

9.7 Variational characterizations of Hermitian Matrices

Chapter 9
Appendix

9.4 Dimension of W₁ + W₂