Definition 9.1.1. Fix n ∈ ℕ. Then, for each f ∈n, we associate an n × n matrix, denoted Pf = [pij], such that pij = 1, whenver f(j) = i and 0, otherwise. The matrix Pf is called the Permutation matrix corresponding to the permutation f. For example, I2, corresponding to Id2, and = E12, corresponding to the permutation (1,2), are the two permutation matrices of order 2 × 2.
Remark 9.1.2. Recall that in Remark 9.2.16.1, it was observed that each permutation is a product of n transpositions, (1,2),…,(1,n).
We are now ready to prove Theorem 2.2.17.
Proof. Note that the matrix A = 0 if and only if B = 0. So, let us assume that the matrices A,B≠0. Also, the row-equivalence of A and B implies that there exists an invertible matrix C such that A = CB, where C is product of elementary matrices.
Since B is in RREF, either B[:,1] = 0T or B[:,1] = (1,0,…,0)T . If B[:,1] = 0T then A[:,1] = CB[:,1] = C0 = 0. If B[:,1] = (1,0,…,0)T then A[:,1] = CB[:,1] = C[:,1]. As C is invertible, the first column of C cannot be the zero vector. So, A[:,1] cannot be the zero vector. Further, A is in RREF implies that A[:,1] = (1,0,…,0)T . So, we have shown that if A and B are row-equivalent then their first columns must be the same.
Now, let us assume that the first k - 1 columns of A and B are equal and it contains r pivotal columns. We will now show that the k-th column is also the same.
Define Ak = [A[:,1],…,A[:,k]] and Bk = [B[:,1],…,B[:,k]]. Then, our assumption implies that A[:,i] = B[:,i], for 1 ≤ i ≤ k - 1. Since, the first k - 1 columns contain r pivotal columns, there exists a permutation matrix P such that
If the k-th columns of A and B are pivotal columns then by definition of RREF, A[:,k] = = B[:,k], where 0 is a vector of size r and e1 = (1,0,…,0)T . So, we need to consider two cases depending on whether both are non-pivotal or one is pivotal and the other is not.
As A = CB, we get Ak = CBk and
Case 1: Neither A[:,k] nor B[:,k] are pivotal. Then
Case 2: A[:,k] is pivotal but B[:,k] in non-pivotal. Then
Therefore, combining both the cases, we get the required result. _
Definition 9.2.1. For a positive integer n, denote [n] = {1,2,…,n}. A function f : A → B is called
Example 9.2.2. Let A = {1,2,3}, B = {a,b,c,d} and C = {α,β,γ}. Then, the function
Remark 9.2.3. Let f : A → B and g : B → C be functions. Then, the composition of functions, denoted g ∘ f, is a function from A to C defined by (g ∘ f)(a) = g(f(a)). Also, if
Thus, if f and g are bijections then so is g ∘ f.
Definition 9.2.4. A function f : [n] → [n] is called a permutation on n elements if f is a bijection. For example, f,g : [2] → [2] defined by f(1) = 1,f(2) = 2 and g(1) = 2,g(2) = 1 are permutations.
Exercise 9.2.5. Let 3 be the set consisting of all permutation on 3 elements. Then, prove that 3 has 6 elements. Moreover, they are one of the 6 functions given below.
Remark 9.2.6. Let f : [n] → [n] be a bijection. Then, the inverse of f, denote f-1, is defined by f-1(m) = ℓ whenever f(ℓ) = m for m ∈ [n] is well defined and f-1 is a bijection. For example, in Exercise 9.2.5, note that fi-1 = fi, for i = 1,2,3,6 and f4-1 = f5.
Remark 9.2.7. Let n = {f : [n] → [n] : σ is a permutation}. Then, n has n! elements and forms a group with respect to composition of functions, called product, due to the following.
Lemma 9.2.8. Fix a positive integer n. Then, the group n satisfies the following:
Proof. Part 1: Note that for each α ∈n the functions f-1∘α,α∘f-1 ∈n and α = f∘(f-1∘α) as well as α = (α ∘ f-1) ∘ f.
Part 2: Note that for each f ∈n, by definition, (f-1)-1 = f. Hence the result holds. __
Definition 9.2.11. [Cycle Notation] Let f ∈ n. Suppose there exist r,2 ≤ r ≤ n and i1,…,ir ∈ [n] such that f(i1) = i2,f(i2) = i3,…,f(ir) = i1 and f(j) = j for all j≠i1,…,ir. Then, we represent such a permutation by f = (i1,i2,…,ir) and call it an r-cycle. For example, f = = (1,4,5) and = (2,3).
Definition 9.2.13. A permutation f ∈n is called a transposition if there exist m,r ∈ [n] such that f = (m,r).
With the above definitions, we state and prove two important results.
Proof. Note that using use Remark 9.2.14, we just need to show that f can be written as product of disjoint cycles.
Consider the set S = {1,f(1),f(2)(1) = (f ∘ f)(1),f(3)(1) = (f ∘ (f ∘ f))(1),…}. As S is an infinite set and each f(i)(1) ∈ [n], there exist i,j with 0 ≤ i < j ≤ n such that f(i)(1) = f(j)(1). Now, let j1 be the least positive integer such that f(i)(1) = f(j1)(1), for some i,0 ≤ i < j1. Then, we claim that i = 0.
For if, i - 1 ≥ 0 then j1 - 1 ≥ 1 and the condition that f is one-one gives
Now, choose i1 ∈ [n] \{1,f(1),f(2)(1),…,f(j1-1)(1)} and proceed as above to get another cycle. Let the new cycle by (i1,f(i1),…,f(j2-1)(i1)). Then, using f is one-one follows that
Remark 9.2.16. Note that when one writes a permutation as product of disjoint cycles, cycles of length 1 are suppressed so as to match Definition 9.2.11. For example, the algorithm in the proof of Theorem 9.2.15 implies
Note that Id3 = (1,2)(1,2) = (1,2)(2,3)(1,2)(1,3), as well. The question arises, is it possible to write Idn as a product of odd number of transpositions? The next lemma answers this question in negative.
Proof. We will prove the result by mathematical induction. Observe that t≠1 as Idn is not a transposition. Hence, t ≥ 2. If t = 2, we are done. So, let us assume that the result holds for all expressions in which the number of transpositions t ≤ k. Now, let t = k + 1.
Suppose f1 = (m,r) and let ℓ,s ∈ [n] \ {m,r}. Then, the possible choices for the composition f1 ∘ f2 are (m,r)(m,r) = Idn,(m,r)(m,ℓ) = (r,ℓ)(r,m),(m,r)(r,ℓ) = (ℓ,r)(ℓ,m) and (m,r)(ℓ,s) = (ℓ,s)(m,r). In the first case, f1 and f2 can be removed to obtain Idn = f3 ∘f4 ∘∘ft, where the number of transpositions is t- 2 = k - 1 < k. So, by mathematical induction, t - 2 is even and hence t is also even.
In the remaining cases, the expression for f1 ∘ f2 is replaced by their counterparts to obtain another expression for Idn. But in the new expression for Idn, m doesn’t appear in the first transposition, but appears in the second transposition. The shifting of m to the right can continue till the number of transpositions reduces by 2 (which in turn gives the result by mathematical induction). For if, the shifting of m to the right doesn’t reduce the number of transpositions then m will get shifted to the right and will appear only in the right most transposition. Then, this expression for Idn does not fix m whereas Idn(m) = m. So, the later case leads us to a contradiction. Hence, the shifting of m to the right will surely lead to an expression in which the number of transpositions at some stage is t - 2 = k - 1. At this stage, one applies mathematical induction to get the required result. __
Theorem 9.2.18. Let f ∈n. If there exist transpositions g1,…,gk and h1,…,hℓ with
Proof. As g1 ∘∘ gk = h1 ∘∘ hℓ and h-1 = h for any transposition h ∈n, we have
Definition 9.2.19. [Even and Odd Permutation] A permutation f ∈n is called an
Definition 9.2.20. Observe that if f and g are both even or both odd permutations, then f ∘g and g ∘f are both even. Whereas, if one of them is odd and the other even then f ∘g and g ∘ f are both odd. We use this to define a function sgn : n →{1,-1}, called the signature of a permutation, by DRAFT
We are now ready to define determinant of a square matrix A.
Definition 9.2.22. Let A = [aij] be an n×n matrix with complex entries. Then, the determinant of A, denoted det(A), is defined as
| (9.2.2) |
For example, if 2 = {Id,f = (1,2)} then for A = , det(A) = sgn(Id) ⋅ a1Id(1)a2Id(2) + sgn(f) ⋅ a1f(1)a2f(2) = 1 ⋅ a11a22 + (-1)a12a21 = 1 - 4 = -3.
Observe that det(A) is a scalar quantity. Even though the expression for det(A) seems complicated at first glance, it is very helpful in proving the results related with “properties of determinant”. We will do so in the next section. As another examples, we verify that this definition also matches for 3 × 3 matrices. So, let A = [aij] be a 3 × 3 matrix. Then, using Equation (9.2.2),
Theorem 9.3.1 (Properties of Determinant). Let A = [aij] be an n × n matrix.
Proof. Part 1: Note that each sum in det(A) contains one entry from each row. So, each sum has an entry from A[i,:] = 0T . Hence, each sum in itself is zero. Thus, det(A) = 0.
Part 2: By assumption, B[k,:] = A[k,:] for k≠i and B[i,:] = cA[i,:]. So,
Part 3: Let τ = (i,j). Then, sgn(τ) = -1, by Lemma 9.2.8, n = {σ ∘ τ : σ ∈n} and
Part 4: As A[i,:] = A[j,:], A = EijA. Hence, by Part 3, det(A) = -det(A). Thus, det(A) = 0.
Part 5: By assumption, C[i,:] = B[i,:] = A[i,:] for i≠m and C[m,:] = B[m,:] + A[m,:]. So,
Part 6: By assumption, B[k,:] = A[k,:] for k≠i and B[i,:] = A[i,:] + cA[j,:]. So,
Part 7: Observe that if σ ∈n and σ≠Idn then n(σ) ≥ 1. Thus, for every σ≠Idn, there exists m ∈ [n] (depending on σ) such that m > σ(m) or m < σ(m). So, if A is triangular, amσ(m) = 0. So, for each σ≠Idn, ∏ i=1naiσ(i) = 0. Hence, det(A) = ∏ i=1naii. the result follows.
Part 8: Using Part 7, det(In) = 1. By definition Eij = EijIn and Ei(c) = Ei(c)In and Eij(c) = Eij(c)In, for c≠0. Thus, using Parts 2, 3 and 6, we get det(Ei(c)) = c,det(Eij) = -1 and det(Eij(k)) = 1. Also, again using Parts 2, 3 and 6, we get det(EA) = det(E)det(A).
Part 9: Suppose A is invertible. Then, by Theorem 2.3.1, A = E1Ek, for some elementary matrices E1,…,Ek. So, a repeated application of Part 8 implies det(A) = det(E1)det(Ek)≠0 as det(Ei)≠0 for 1 ≤ i ≤ k. DRAFT
Now, suppose that det(A)≠0. We need to show that A is invertible. On the contrary, assume that A is not invertible. Then, by Theorem 2.3.1, Rank(A) < n. So, by Proposition 2.2.21, there exist elementary matrices E1,…,Ek such that E1EkA = . Therefore, by Part 1 and a repeated application of Part 8 gives
Part 10: Let A be invertible. Then, by Theorem 2.3.1, A = E1Ek, for some elementary matrices E1,…,Ek. So, applying Part 8 repeatedly gives det(A) = det(E1)det(Ek) and
In case A is not invertible, by Part 9, det(A) = 0. Also, AB is not invertible (AB is invertible will imply A is invertible using the rank argument). So, again by Part 9, det(AB) = 0. Thus, det(AB) = det(A)det(B).
Part 11: Let B = [bij] = AT . Then, bij = aji, for 1 ≤ i,j ≤ n. By Lemma 9.2.8, we know that n = {σ-1 : σ ∈n}. As σ ∘ σ-1 = Idn, sgn(σ) = sgn(σ-1). Hence,
We now relate this definition of determinant with the one given in Definition 2.3.6.
Theorem 9.3.3. Let A be an n × n matrix. Then, det(A) = ∑ j=1n(-1)1+ja1j detA(1|j), where recall that A(1|j) is the submatrix of A obtained by removing the 1st row and the jth column.
Proof. For 1 ≤ j ≤ n, define an n × n matrix Bj = . Also, for each matrix Bj, we define the n × n matrix Cj by
Also, observe that Bj’s have been defined to satisfy B1[1,:] + + Bn[1,:] = A[1,:] and Bj[i,:] = A[i,:] for all i ≥ 2 and 1 ≤ j ≤ n. Thus, by Theorem 9.3.1.5,
| (9.3.1) |
Let us now compute det(Bj), for 1 ≤ j ≤ n. Note that Cj = E12E23Ej-1,jBj, for 1 ≤ j ≤ n. Then, by Theorem 9.3.1.3, we get det(Bj) = (-1)j-1 det(Cj). So, using Remark 9.3.2.2 and Theorem 9.3.1.2 and Equation (9.3.1), we have
Theorem 9.4.1. Let V be a finite dimensional vector space over F and let W1 and W2 be two subspaces of V. Then,
| (9.4.1) |
DRAFT
Proof. Since W1 ∩ W2 is a vector subspace of V , let = {u1,…,ur} be a basis of W1 ∩ W2. As, W1 ∩ W2 is a subspace of both W1 and W2, let us extend the basis to form a basis 1 = {u1,…,ur,v1,…,vs} of W1 and a basis 2 = {u1,…,ur,w1,…,wt} of W2.
We now prove that = {u1,…,ur,w1,…,ws,v1,v2,…,vt} is a basis of W1 + W2. To do this, we show that
The second part can be easily verified. For the first part, consider the linear system
| (9.4.2) |
in the variables αi’s, βj’s and γk’s. We re-write the system as
Substituting this representation of v in Equation (9.4.2), we get
In this section, we prove the following result. A generalization of this result to complex vector space is left as an exercise for the reader as it requires similar ideas.
Theorem 9.5.1. Let V be a real vector space. A norm ∥⋅∥ is induced by an inner product if and only if, for all x,y ∈ V, the norm satisfies
DRAFT
| (9.5.1) |
Proof. Suppose that ∥⋅∥ is indeed induced by an inner product. Then, by Exercise 5.1.7.3 the result follows.
So, let us assume that ∥⋅∥ satisfies the parallelogram law. So, we need to define an inner product. We claim that the function f : V × V → ℝ defined by
| (9.5.2) |
Thus, for x,y,z ∈ V, we have
Now, substituting z = 0 in Equation (9.5.3) and using Equation (9.5.2), we get 2f(x,y) = f(x,2y) and hence 4f(x + z,y) = 2f(x + z,2y) = 4. Thus,
| (9.5.4) |
Note that the second term of g(x) is a constant multiple of x and hence continuous. Using a similar reason, it is enough to show that g1(x) = ∥xu + v∥, for certain fixed vectors u,v ∈ V, is continuous. To do so, note that
Thus, we have proved the continuity of g and hence the prove of the required result. _
The main aim of this subsection is to prove the continuous dependence of the zeros of a polynomial on its coefficients and to recall Descartes’s rule of signs.
Definition 9.6.1. [Jordan Curves] A curve in ℂ is a continuous function f : [a,b] → ℂ, where [a,b] ⊆ ℝ.
We state the famous Rouche Theorem of complex analysis without proof.
DRAFT Theorem 9.6.2. [Rouche’s Theorem] Let C be a positively oriented simple closed contour. Also, let f and g be two analytic functions on RC, the union of the interior of C and the curve C itself. Assume also that |f(x)| > |g(x)|, for all x ∈ C. Then, f and f + g have the same number of zeros in the interior of C.
Corollary 9.6.3. [Alen Alexanderian, The University of Texas at Austin, USA.] Let P(t) = tn + an-1tn-1 + + a0 have distinct roots λ1,…,λm with multiplicities α1,…,αm, respectively. Take any ϵ > 0 for which the balls Bϵ(λi) are disjoint. Then, there exists a δ > 0 such that the polynomial q(t) = tn + an-1′tn-1 + + a0′ has exactly αi roots (counting with multiplicities) in Bϵ(λi), whenever |aj - aj′| < δ.
Proof. For an ϵ > 0 and 1 ≤ i ≤ m, let Ci = {z ∈ ℂ : |z - λi| = ϵ}. Now, for each i,1 ≤ i ≤ m, take νi = minz∈Ci|p(z)|, ρi = maxz∈Ci[1 + |z| + + |z|n-1] and choose δ > 0 such that ρiδ < νi. Then, for a fixed j and z ∈ Cj, we have
As a direct application, we obtain the following corollary.
Proof. Follows from Corollary 9.6.3. _
Proof. Assume that a0,a1,,an has k > 0 sign changes. Let b > 0. Then, the coefficients of (x - b)P(x) are
Now, assume that P(x) = 0 has k positive roots b1,b2,,bk. Then, DRAFT
Let A ∈ Mn(ℂ) be a Hermitian matrix. Then, by Theorem 6.2.22, we know that all the eigenvalues of A are real. So, we write λi(A) to mean the i-th smallest eigenvalue of A. That is, the i-th from the left in the list λ1(A) ≤ λ2(A) ≤≤ λn(A).
Proof. Proof of Part 1: By spectral theorem (see Theorem 6.2.22, there exists a unitary matrix U such that A = UDU*, where D = diag(λ1(A),…,λn(A)) is a real diagonal matrix. Thus, the set {U[:,1],…,U[:,n]} is a basis of ℂn. Hence, for each x ∈ ℂn, there exists _i’s (scalar) such that x = ∑ αiU[:,i]. So, note that x*x = |αi|2 and DRAFT
As an immediate corollary, we state the following result.
Corollary 9.7.2. Let A ∈ Mn(ℂ) be a Hermitian matrix and α = . Then, A has an eigenvalue in the interval (-∞,α] and has an eigenvalue in the interval [α,∞).
We now generalize the second and third parts of Theorem 9.7.2.
Proposition 9.7.3. Let A ∈ Mn(ℂ) be a Hermitian matrix with A = UDU*, where U is a unitary matrix and D is a diagonal matrix consisting of the eigenvalues λ1 ≤ λ2 ≤≤ λn. Then, for any positive integer k,1 ≤ k ≤ n,
Proof. Let x ∈ ℂn such that x is orthogonal to U[,1],…,U[:,k - 1]. Then, we can write x = ∑ i=knαiU[:,i], for some scalars αi’s. In that case, DRAFT
Theorem 9.7.4. [Courant-Fischer] Let A ∈ Mn(ℂ) be a Hermitian matrix with eigenvalues λ1 ≤ λ2 ≤≤ λn. Then,
Proof. Let A = UDU*, where U is a unitary matrix and D = diag(λ1,…,λn). Now, choose a set of k - 1 linearly independent vectors from ℂn, say S = {w1,…,wk-1}. Then, some of the eigenvectors {U[:,1],…,U[:,k - 1]} may be an element of S⊥. Thus, using Proposition 9.7.3, we see that
Theorem 9.7.5. [Weyl Interlacing Theorem] Let A,B ∈ Mn(ℂ) be a Hermitian matrices. Then, λk(A) + λ1(B) ≤ λk(A + B) ≤ λk(A) + λn(B). In particular, if B = P*P, for some matrix P, then λk(A + B) ≥ λk(A). In particular, for z ∈ ℂn, λk(A + zz*) ≤ λk+1(A).
Proof. As A and B are Hermitian matrices, the matrix A + B is also Hermitian. Hence, by Courant-Fischer theorem and Lemma 9.7.1.1,
If B = P*P, then λ1(B) = min∥x∥=1x*(P*P)x = min∥x∥=1∥Px∥2 ≥ 0. Thus, DRAFT
In particular, for z ∈ ℂn, we have
Theorem 9.7.6. [Cauchy Interlacing Theorem] Let A ∈ Mn(ℂ) be a Hermitian matrix. Define  = , for some a ∈ ℝ and y ∈ ℂn. Then,
Proof. Note that
As an immediate corollary, one has the following result.
Corollary 9.7.7. [Inclusion principle] Let A ∈ Mn(ℂ) be a Hermitian matrix and r be a positive integer with 1 ≤ r ≤ n. If Br×r is a principal submatrix of A then, λk(A) ≤ λk(B) ≤ λk+n-r(A).
DRAFT Theorem 9.7.8. [Poincare Separation Theorem] Let A ∈ Mn(ℂ) be a Hermitian matrix and {u1,…,ur}⊆ ℂn be an orthonormal set for some positive integer r,1 ≤ r ≤ n. If further B = [bij] is an r × r matrix with bij = ui*Auj,1 ≤ i,j ≤ r then, λk(A) ≤ λk(B) ≤ λk+n-r(A).
Proof. Let us extend the orthonormal set {u1,…,ur} to an orthonormal basis, say {u1,…,un} of ℂn and write U = . Then, B is a r × r principal submatrix of U*AU. Thus, by inclusion principle, λk(U*AU) ≤ λk(B) ≤ λk+n-r(U*AU). But, we know that σ(U*AU) = σ(A) and hence the required result follows. _
The proof of the next result is left for the reader.
Corollary 9.7.9. Let A ∈ Mn(ℂ) be a Hermitian matrix and r be a positive integer with 1 ≤ r ≤ n. Then,
Corollary 9.7.10. Let A ∈ Mn(ℂ) be a Hermitian matrix and W be a k-dimensional subspace of ℂn. Suppose, there exists a real number c such that x*Ax ≥ cx*x, for each x ∈ W. Then, λn-k+1(A) ≥ c. In particular, if x*Ax > 0, for each nonzero x ∈ W, then λn-k+1 > 0. Note that, a k-dimensional subspace need not contain an eigenvector of A. For example, the line y = 2x does not contain an eigenvector of .
Proof. Let {x1,…,xn-k} be a basis of W⊥. Then, DRAFT
Now assume that x*Ax > 0 holds for each nonzero x ∈ W and that λn-k+1 = 0. Then, it follows that min ∥x∥=1 x⊥x1,…,xn-k x*Ax = 0. Now, define f : ℂn → ℂ by f(x) = x*Ax.
Then, f is a continuous function and min∥x∥=1 x∈W f(x) = 0. Thus, f must attain its bound on the unit sphere. That is, there exists y ∈ W with ∥y∥ = 1 such that y*Ay = 0, a contradiction. Thus, the required result follows. _ DRAFT DRAFT DRAFT