Recall the dot product in ℝ2 and ℝ3. Dot product helped us to compute the length of vectors
and angle between vectors. This enabled us to rephrase geometrical problems in ℝ2 and
ℝ3 in the language of vectors. We generalize the idea of dot product to achieve similar
goal for a general vector space over ℝ or ℂ. So, in this chapter F will denote either ℝ or
ℂ.
Definition 5.1.1. [Inner Product] Let V be a vector space over F. An inner product over V,
denoted by ⟨,⟩, is a map from V × V to F satisfying
-
1.
- ⟨au + bv,w⟩ = a⟨u,w⟩ + b⟨v,w⟩, for all u,v,w ∈ V and a,b ∈ F,
-
2.
- ⟨u,v⟩ = ⟨v,u⟩, the complex conjugate of ⟨u,v⟩, for all u,v ∈ V and
-
3.
- ⟨u,u⟩≥ 0 for all u ∈ V. Furthermore, equality holds if and only if u = 0.
Remark 5.1.2. Using the definition of inner product, we immediately observe that
-
1.
- ⟨v,αw⟩ = ⟨αw,v⟩ = α⟨w,v⟩ = α⟨v,w⟩, for all α ∈ F and v,w ∈ V.
-
2.
- If ⟨u,v⟩ = 0 for all v ∈ V then in particular ⟨u,u⟩ = 0. Hence, u = 0.
DRAFT
Definition 5.1.3. [Inner Product Space] Let V be a vector space with an inner product ⟨,⟩.
Then, (V,⟨,⟩) is called an inner product space (in short, ips).
Example 5.1.4. Examples 1 and 2 that appear below are called the standard inner product or
the dot product on ℝn and ℂn, respectively. Whenever an inner product is not clearly mentioned, it
will be assumed to be the standard inner product.
-
1.
- For u = (u1,…,un)T ,v = (v1,…,vn)T ∈ ℝn define ⟨u,v⟩ = u1v1 + + unvn = vT u.
Then, ⟨,⟩ is indeed an inner product and hence ℝn,⟨,⟩ is an ips.
-
2.
- For u = (u1,…,un)*,v = (v1,…,vn)*∈ ℂn define ⟨u,v⟩ = u1v1++unvn = v*u. Then,
ℂn,⟨,⟩ is an ips.
-
3.
- For x = (x1,x2)T ,y = (y1,y2)T ∈ ℝ2 and A = , define ⟨x,y⟩ = yT Ax. Then,
⟨,⟩ is an inner product as ⟨x,x⟩ = (x1 - x2)2 + 3x12 + x22.
-
4.
- Fix A = with a,c > 0 and ac > b2. Then, ⟨x,y⟩ = yT Ax is an inner product on
ℝ2 as ⟨x,x⟩ = ax12 + 2bx1x2 + cx22 = a2 + x22.
-
5.
- Verify that for x = (x1,x2,x3)T ,y = (y1,y2,y3)T ∈ ℝ3, ⟨x,y⟩ = 10x1y1+3x1y2+3x2y1+
2x2y2 + x2y3 + x3y2 + x3y3 defines an inner product.
-
6.
- For x = (x1,x2)T ,y = (y1,y2)T ∈ ℝ2, we define three maps that satisfy at least one condition
out of the three conditions for an inner product. Determine the condition which is not satisfied.
Give reasons for your answer.
-
(a)
- ⟨x,y⟩ = x1y1.
-
(b)
- ⟨x,y⟩ = x12 + y12 + x22 + y22.
-
(c)
- ⟨x,y⟩ = x1y13 + x2y23.
DRAFT
-
7.
- Let A ∈ Mn(ℂ) be a Hermitian matrix. Then, for x,y ∈ ℂn, define ⟨x,y⟩ = y*Ax. Then, ⟨,⟩
satisfies ⟨x,y⟩ = ⟨y,x⟩ and ⟨x + αz,y⟩ = ⟨x,y⟩ + α⟨z,y⟩, for all x,y,z ∈ ℂn and α ∈ ℂ. Does
there exist conditions on A such that ⟨x,x⟩≥ 0 for all x ∈ ℂ? This will be answered in
affirmative in the chapter on eigenvalues and eigenvectors.
-
8.
- For A,B ∈ Mn(ℝ), define ⟨A,B⟩ = tr(BT A). Then, If A = [aij] then ⟨A,A⟩ = tr(AT A) = ∑
i=1n(AT A)ii = ∑
i,j=1naijaij = ∑
i,j=1naij2 and
therefore, ⟨A,A⟩ > 0 for all nonzero matrix A.
-
9.
- Consider the complex vector space [-1,1] and define ⟨f,g⟩ = ∫
-11f(x)g(x)dx. Then,
-
(a)
- ⟨f,f⟩ = ∫
-11|f(x)|2dx ≥ 0 as |f(x)|2 ≥ 0 and this integral is 0 if and only if f ≡ 0 as
f is continuous.
-
(b)
- ⟨g,f⟩ = ∫
-11g(x)f(x)dx = ∫
-11g(x)f(x)dx = ∫
-11f(x)g(x)dx = ⟨f,g⟩.
-
(c)
- ⟨f + g,h⟩ = ∫
-11(f + g)(x)h(x)dx = ∫
-11[f(x)h(x) + g(x)h(x)]dx = ⟨f,h⟩ + ⟨g,h⟩.
-
(d)
- ⟨αf,g⟩ = ∫
-11(αf(x))g(x)dx = α∫
-11f(x)g(x)dx = α⟨f,g⟩.
-
(e)
- Fix an ordered basis = [u1,…,un] of a complex vector space V. Then, for any
u,v ∈ V, with [u] = and [v] = , define ⟨u,v⟩ = ∑
i=1naibi. Then,
⟨,⟩ is indeed an inner product in V. So, any finite dimensional vector space can be
endowed with an inner product.
DRAFT
As ⟨u,u⟩ > 0, for all u≠0, we use inner product to define length of a vector.
Definition 5.1.5. [Length / Norm of a Vector] Let V be a vector space over F. Then, for
any vector u ∈ V, we define the length (norm) of u, denoted ∥u∥ = , the positive
square root. A vector of norm 1 is called a unit vector. Thus, is called the unit vector
in the direction of u.
Exercise 5.1.7.
-
1.
- Let u = (-1,1,2,3,7)T ∈ ℝ5. Find all α ∈ ℝ such that ∥αu∥ = 1.
-
2.
- Let u = (-1,1,2,3,7)T ∈ ℂ5. Find all α ∈ ℂ such that ∥αu∥ = 1.
DRAFT
-
3.
- Prove that ∥x + y∥2 + ∥x - y∥2 = 2∥x∥2 + ∥y∥2, for all xT ,yT ∈ ℝn. This equality is
called the Parallelogram Law as in a parallelogram the sum of square of the lengths
of the diagonals is equal to twice the sum of squares of the lengths of the sides.
-
4.
- Apollonius’ Identity: Let the length of the sides of a triangle be a,b,c ∈ ℝ and that
of the median be d ∈ ℝ. If the median is drawn on the side with length a then prove that
b2 + c2 = 2.
-
5.
- Let u = (1,2)T ,v = (2,-1)T ∈ ℝ2. Then, does there exist an inner product in ℝ2 such
that ∥u∥ = 1,∥v∥ = 1 and ⟨u,v⟩ = 0? [Hint: Let A = and define ⟨x,y⟩ = yT Ax.
Use given conditions to get a linear system of 3 equations in the variables a,b,c.]
-
6.
- Let x = (x1,x2)T ,y = (y1,y2)T ∈ ℝ2. Then, ⟨x,y⟩ = 3x1y1 - x1y2 - x2y1 + x2y2 defines an
inner product. Use this inner product to find
-
(a)
- the angle between e1 = (1,0)T and e2 = (0,1)T .
-
(b)
- v ∈ ℝ2 such that ⟨v,e1⟩ = 0.
-
(c)
- x,y ∈ ℝ2 such that ∥x∥ = ∥y∥ = 1 and ⟨x,y⟩ = 0.
A very useful and a fundamental inequality, commonly called the Cauchy-Schwartz inequality,
concerning the inner product is proved next.
Theorem 5.1.8 (Cauchy-Bunyakovskii-Schwartz inequality). Let V be an inner product space over
F. Then, for any u,v ∈ V
Moreover, equality holds in Inequality (5.1.1) if and only if u and v are linearly dependent.
Furthermore, if u≠0 then v = .
Proof. If u = 0 then Inequality (5.1.1) holds. Hence, let u≠0. Then, by Definition 5.1.1.3,
⟨λu + v,λu + v⟩≥ 0 for all λ ∈ F and v ∈ V. In particular, for λ = -,
Or, in other words |⟨v,u⟩|2 ≤∥u∥2∥v∥2 and the proof of the inequality is over.
Now, note that equality holds in Inequality (5.1.1) if and only if ⟨λu + v,λu + v⟩ = 0, or
equivalently, λu + v = 0. Hence, u and v are linearly dependent. Moreover,
implies that v = -λu = -u = . _
Let V be a real vector space. Then, for u,v ∈ V, the Cauchy-Schwartz inequality implies that
-1 ≤≤ 1. We use this together with the properties of the cosine function to define the angle
between two vectors in an inner product space.
Definition 5.1.10. [Angle between Vectors] Let V be a real vector space. If θ ∈ [0,π] is the
angle between u,v ∈ V \{0} then we define
Example 5.1.11.
-
1.
- Take (1,0)T ,(1,1)T ∈ ℝ2. Then, cosθ = . So θ = π∕4.
-
2.
- Take (1,1,0)T ,(1,1,1)T ∈ ℝ3. Then, angle between them, say β = cos-1.
-
3.
- Angle depends on the IP. Take ⟨x,y⟩ = 2x1y1 + x1y2 + x2y1 + x2y2 on ℝ2. Then, angle
between (1,0)T ,(1,1)T ∈ ℝ2 equals cos-1.
-
4.
- As ⟨x,y⟩ = ⟨y,x⟩ for any real vector space, the angle between x and y is same as the
angle between y and x.
DRAFT
-
5.
- Let a,b ∈ ℝ with a,b > 0. Then, prove that ≥ 4.
-
6.
- For 1 ≤ i ≤ n, let ai ∈ ℝ with ai > 0. Then, use Corollary 5.1.9 to show that
≥ n2.
-
7.
- Prove that |z1 + + zn| ≤ , for z1,…,zn ∈ ℂ. When does the
equality hold?
-
8.
- Let V be an ips. If u,v ∈ V with ∥u∥ = 1,∥v∥ = 1 and ⟨u,v⟩ = 1 then prove that u = αv
for some α ∈ F. Is α = 1?
We will now prove that if A,B and C are the vertices of a triangle (see Figure 5.1) and a,b and c,
respectively, are the lengths of the corresponding sides then cos(A) = . This in turn implies
that the angle between vectors has been rightly defined.
Lemma 5.1.12. Let A,B and C be the vertices of a triangle (see Figure 5.1) with
corresponding side lengths a,b and c, respectively, in a real inner product space V then
Proof. Let 0, u and v be the coordinates of the vertices A,B and C, respectively, of the triangle ABC.
Then, = u, = v and = v - u. Thus, we need to prove that
Now,
by definition ∥v - u∥2 = ∥v∥2 + ∥u∥2 - 2⟨v,u⟩ and hence ∥v∥2 + ∥u∥2 -∥v - u∥2 = 2⟨u,v⟩. As
⟨v,u⟩ = ∥v∥∥u∥cos(A), the required result follows. _
Example 5.1.14.
-
1.
- 0 is orthogonal to every vector as ⟨0,x⟩ = 0 for all x ∈ V.
-
2.
- If V is a vector space over ℝ or ℂ then 0 is the only vector that is orthogonal to itself.
-
3.
- Let V = ℝ.
-
(a)
- S = {0}. Then, S⊥ = ℝ.
-
(b)
- S = ℝ, Then, S⊥ = {0}.
-
(c)
- Let S be any subset of ℝ containing a nonzero real number. Then, S⊥ = {0}.
-
4.
- Let u = (1,2)T . What is u⊥ in ℝ2?
Solution: {(x,y)T ∈ ℝ2|x + 2y = 0}. Is this Null(u)? Note that (2,-1)T is a basis of u⊥ and
for any vector x ∈ ℝ2,
DRAFT
is a decomposition of x into two vectors, one parallel to u and the other parallel to
u⊥.
-
5.
- Fix u = (1,1,1,1)T ,v = (1,1,-1,0)T ∈ ℝ4. Determine z,w ∈ ℝ4 such that u = z + w with the
condition that z is parallel to v and w is orthogonal to v.
Solution: As z is parallel to v, z = kv = (k,k,-k,0)T , for some k ∈ ℝ. Since w is
orthogonal to v the vector w = (a,b,c,d)T satisfies a + b - c = 0. Thus, c = a + b
and
Comparing the corresponding coordinates, gives the linear system d = 1, a + k = 1, b + k = 1
and a + b - k = 1 in the variables a,b,d and k. Thus, solving for a,b,d and k gives
z = (1,1,-1,0)T and w = (2,2,4,3)T .
-
6.
- Let x,y ∈ ℝn then prove that
-
(a)
- ⟨x,y⟩ = 0⇐⇒∥x - y∥2 = ∥x∥2 + ∥y∥2 (Pythagoras Theorem).
Solution: Use ∥x - y∥2 = ∥x∥2 + ∥y∥2 - 2⟨x,y⟩ to get the required result follows.
-
(b)
- ∥x∥ = ∥y∥⇐⇒⟨x + y,x - y⟩ = 0 (x and y form adjacent sides of a rhombus as the
diagonals x + y and x - y are orthogonal).
Solution: Use ⟨x + y,x - y⟩ = ∥x∥2 -∥y∥2 to get the required result follows.
-
(c)
- 4⟨x,y⟩ = ∥x + y∥2 -∥x - y∥2 (polarization identity in ℝn).
Solution: Just expand the right hand side to get the required result follows.
DRAFT
-
(d)
- ∥x + y∥2 + ∥x-y∥2 = 2∥x∥2 + 2∥y∥2 (parallelogram law: the sum of squares of
the diagonals of a parallelogram equals twice the sum of squares of its sides).
Solution: Just expand the left hand side to get the required result follows.
-
7.
- Let P = (1,1,1)T , Q = (2,1,3)T and R = (-1,1,2)T be three vertices of a triangle in ℝ3.
Compute the angle between the sides PQ and PR.
Solution: Method 1: Note that = (2,1,3)T - (1,1,1)T = (1,0,2)T , = (-2,0,1)T and
= (-3,0,-1)T . As ⟨,⟩ = 0, the angle between the sides PQ and PR is
.
Method 2: ∥PQ∥ = ,∥PR∥ = and ∥QR∥ = . As ∥QR∥2 = ∥PQ∥2 + ∥PR∥2, by
Pythagoras theorem, the angle between the sides PQ and PR is .
Exercise 5.1.15.
-
1.
- Let V be an ips.
-
(a)
- If S ⊆ V then S⊥ is a subspace of V and S⊥ = (LS(S))⊥.
-
(b)
- Furthermore, if V is finite dimensional then S⊥ and LS(S) are complementary.
That is, V = LS(S) + S⊥. Equivalently, ⟨u,w⟩ = 0, for all u ∈ LS(S) and w ∈ S⊥.
-
2.
- Consider ℝ3 with the standard inner product. Find
-
(a)
- S⊥ for S = {(1,1,1)T ,(0,1,-1)T } and S = LS((1,1,1)T ,(0,1,-1)T ).
-
(b)
- vectors v,w ∈ ℝ3 such that v,w,u = (1,1,1)T are mutually orthogonal.
-
(c)
- the line passing through (1,1,-1)T and parallel to (a,b,c)≠0.
-
(d)
- the plane containing (1,1 - 1) with (a,b,c)≠0 as the normal vector.
-
(e)
- the area of the parallelogram with three vertices 0T , (1,2,-2)T and (2,3,0)T .
DRAFT
-
(f)
- the area of the parallelogram when ∥x∥ = 5,∥x - y∥ = 8 and ∥x + y∥ = 14.
-
(g)
- the plane containing (2,-2,1)T and perpendicular to the line with parametric
equation x = t - 1,y = 3t + 2,z = t + 1.
-
(h)
- the plane containing the lines (1,2,-2) + t(1,1,0) and (1,2,-2) + t(0,1,2).
-
(i)
- k such that cos-1 = π∕3, where u = (1,-1,1)T and v = (1,k,1)T .
-
(j)
- the plane containing (1,1,2)T and orthogonal to the line with parametric equation
x = 2 + t,y = 3 and z = 1 - t.
-
(k)
- a parametric equation of a line containing (1,-2,1)T and orthogonal to x+3y+2z =
1.
-
3.
- Let P = (3,0,2)T ,Q = (1,2,-1)T and R = (2,-1,1)T be three points in ℝ3. Then,
-
(a)
- find the area of the triangle with vertices P,Q and R.
-
(b)
- find the area of the parallelogram built on vectors and .
-
(c)
- find a nonzero vector orthogonal to the plane of the above triangle.
-
(d)
- find all vectors x orthogonal to and with ∥x∥ = .
-
(e)
- the volume of the parallelepiped built on vectors and and x, where x is one
of the vectors found in Part 3d. Do you think the volume would be different if you
choose the other vector x?
-
4.
- Let p1 be a plane containing A = (1,2,3)T and (2,-1,1)T as its normal vector.
Then,
-
(a)
- find the equation of the plane p2 that is parallel to p1 and contains (-1,2,-3)T .
-
(b)
- calculate the distance between the planes p1 and p2.
-
5.
- In the parallelogram ABCD, AB∥DC and AD∥BC and A = (-2,1,3)T ,B = (-1,2,2)T and
C = (-3,1,5)T . Find the
DRAFT
-
(a)
- coordinates of the point D,
-
(b)
- cosine of the angle BCD.
-
(c)
- area of the triangle ABC
-
(d)
- volume of the parallelepiped determined by AB,AD and (0,0,-7)T .
-
6.
- Let W = {(x,y,z,w)T ∈ ℝ4 : x + y + z - w = 0}. Find a basis of W⊥.
-
7.
- Recall the ips Mn(ℝ) (see Example 5.1.4.8). If W = {A ∈ Mn(ℝ)|AT = A} then
W⊥?
To proceed further, recall that a vector space over ℝ or ℂ was a linear space.
Definition 5.1.16. [Normed Linear Space] Let V be a linear space.
-
1.
- Then, a norm on V is a function f(x) = ∥x∥ from V to ℝ such that
-
(a)
- ∥x∥≥ 0 for all x ∈ V and if ∥x∥ = 0 then x = 0.
-
(b)
- ∥αx∥ = |α|∥x∥ for all α ∈ F and x ∈ V.
-
(c)
- ∥x + y∥≤∥x∥ + ∥y∥ for all x,y ∈ V (triangle inequality).
-
2.
- A linear space with a norm on it is called a normed linear space (nls).
Proof. As ∥x∥ = ∥x - y + y∥≤∥x - y∥ + ∥y∥ one has ∥x∥-∥y∥≤∥x - y∥. Similarly, one obtains
∥y∥-∥x∥≤∥y - x∥ = ∥x - y∥. Combining the two, the required result follows. _
The next result is stated without proof as the proof is beyond the scope of this book.
Theorem 5.1.20. Let ∥⋅∥ be a norm on a nls V. Then, ∥⋅∥ is induced by some inner product
if and only if ∥⋅∥ satisfies the parallelogram law: ∥x + y∥2 + ∥x - y∥2 = 2∥x∥2 + 2∥y∥2.
Example 5.1.21.
-
1.
- For x = (x1,x2)T ∈ ℝ2, we define ∥x∥1 = |x1| + |x2|. Verify that ∥x∥1 is indeed a norm.
But, for x = e1 and y = e2, 2(∥x∥2 + ∥y∥2) = 4 whereas
DRAFT
So, the parallelogram law fails. Thus, ∥x∥1 is not induced by any inner product in ℝ2.
-
2.
- Does there exist an inner product in ℝ2 such that ∥x∥ = max{|x1|,|x2|}?
-
3.
- If ∥⋅∥ is a norm in V then d(x,y) = ∥x - y∥, for x,y ∈ V, defines a distance function as
-
(a)
- d(x,x) = 0, for each x ∈ V.
-
(b)
- using the triangle inequality, for any z ∈ V, we have
We start with the definition of an orthonormal set.
Definition 5.2.1. Let V be an ips. Then, a non-empty set S = {v1,…,vn}⊆ V is called an
orthogonal set if vi and vj are mutually orthogonal, for 1 ≤ i≠j ≤ n, i.e.,
DRAFT
Further, if ∥vi∥ = 1, for 1 ≤ i ≤ n, Then S is called an orthonormal set. If S is also a
basis of V then S is called an orthonormal basis of V.
Example 5.2.2.
-
1.
- A few orthonormal sets in ℝ2 are
-
2.
- Let S = {e1,…,en} be the standard basis of ℝn. Then, S is an orthonormal set
as
-
(a)
- ∥ei∥ = 1, for 1 ≤ i ≤ n.
-
(b)
- ⟨ei,ej⟩ = 0, for 1 ≤ i≠j ≤ n.
-
3.
- The set is an orthonormal set in
ℝ3.
DRAFT
-
4.
- Recall that ⟨f(x),g(x)⟩ = ∫
-ππf(x)g(x)dx defines the standard inner product in [-π,π].
Consider S = {1}∪{em|m ≥ 1}∪{fn|n ≥ 1}, where 1(x) = 1, em(x) = cos(mx) and
fn(x) = sin(nx), for all m,n ≥ 1 and for all x ∈ [-π,π]. Then,
-
(a)
- S is a linearly independent set.
-
(b)
- ∥1∥2 = 2π, ∥em∥2 = π and ∥fn∥2 = π.
-
(c)
- the functions in S are orthogonal.
Hence, ∪∪ is an orthonormal set in [-π,π].
To proceed further, we consider a few examples for better understanding.
Example 5.2.3. Which point on the plane P is closest to the point, say Q?
Solution: Let y be the foot of the perpendicular from Q on P. Thus, by Pythagoras Theorem,
this point is unique. So, the question arises: how do we find y?
Note that gives a normal vector of the plane P. Hence, = y + . So, need to decompose
into two vectors such that one of them lies on the plane P and the other is orthogonal to the
plane.
Thus, we see that given u,v ∈ V \{0}, we need to find two vectors, say y and z, such that y is
parallel to u and z is perpendicular to u. Thus, y = ucos(θ) and z = usin(θ), where θ is the angle
between u and v.
We do this as follows (see Figure 5.2). Let = be the unit vector in the direction of u. Then,
using trigonometry, cos(θ) = . Hence ∥∥ = ∥∥cos(θ). Now using Definition 5.1.10,
∥∥ = ∥v∥ = , where the absolute value is taken as the length/norm is a positive
quantity. Thus,
Hence, y = = and z = v -. In literature, the vector y = is
called the orthogonal projection of v on u, denoted Proju(v). Thus,
| (5.2.1) |
Moreover, the distance of u from the point P equals ∥∥ = ∥∥ = .
Example 5.2.4.
-
1.
- Determine the foot of the perpendicular from the point (1,2,3) on the XY -plane.
Solution: Verify that the required point is (1,2,0)?
DRAFT
-
2.
- Determine the foot of the perpendicular from the point Q = (1,2,3,4) on the plane
generated by (1,1,0,0),(1,0,1,0) and (0,1,1,1).
Answer: (x,y,z,w) lies on the plane x-y-z+2w = 0 ⇔⟨(1,-1,-1,2),(x,y,z,w)⟩ = 0.
So, the required point equals
-
3.
- Determine the projection of v = (1,1,1,1)T on u = (1,1,-1,0)T .
Solution: By Equation (5.2.1), we have Projv(u) = = (1,1,-1,0)T and
w = (1,1,1,1)T - Projv(u) = (2,2,4,3)T is orthogonal to u.
-
4.
- Let u = (1,1,1,1)T ,v = (1,1,-1,0)T ,w = (1,1,0,-1)T ∈ ℝ4. Write v = v1 + v2, where v1 is
parallel to u and v2 is orthogonal to u. Also, write w = w1 + w2 + w3 such that
w1 is parallel to u, w2 is parallel to v2 and w3 is orthogonal to both u and v2.
Solution: Note that
-
(a)
- v1 = Proju(v) = ⟨v,u⟩ = u = (1,1,1,1)T is parallel to u.
-
(b)
- v2 = v -u = (3,3,-5,-1)T is orthogonal to u.
Note that Proju(w) is parallel to u and Projv2(w) is parallel to v2. Hence, we have
-
(a)
- w1 = Proju(w) = ⟨w,u⟩ = u = (1,1,1,1)T is parallel to u,
-
(b)
- w2 = Projv2(w) = ⟨w,v2⟩ = (3,3,-5,-1)T is parallel to v2 and
-
(c)
- w3 = w - w1 - w2 = (1,1,2,-4)T is orthogonal to both u and v2.
DRAFT
We now prove the most important initial result of this section.
Theorem 5.2.5. Let S = {u1,…,un} be an orthonormal subset of an ips V(F).
-
1.
- Then, S is a linearly independent subset of V.
-
2.
- Suppose v ∈ LS(S) with v = ∑
i=1nαiui, for some αi’s in F. Then,
-
(a)
- αi = ⟨v,ui⟩.
-
(b)
- ∥v∥2 = ∥∑
i=1nαiui∥2 = ∑
i=1n|αi|2.
-
3.
- Let z ∈ V and w = ∑
i=1n⟨z,ui⟩ui. Then, z = w + with ⟨z - w,w⟩ = 0, i.e.,
z - w ∈ LS(S)⊥. Further, ∥z∥2 = ∥w∥2 + ∥z - w∥2 ≥∥w∥2.
-
4.
- Let dim(V) = n. Then, ⟨v,ui⟩ = 0 for all i = 1,2,…,n if and only if v = 0.
Proof. Part 1: Consider the linear system c1u1 + + cnun = 0 in the variables c1,…,cn. As ⟨0,u⟩ = 0
and ⟨uj,ui⟩ = 0, for all j≠i, we have
Hence, ci = 0, for 1 ≤ i ≤ n. Thus, the above linear system has only the trivial solution. So, the set S
is linearly independent.
Part 2: Note that ⟨v,ui⟩ = ⟨∑
j=1nαjuj,ui⟩ = ∑
j=1nαj⟨uj,ui⟩ = αi⟨ui,ui⟩ = αi. This completes
the first sub-part. For the second sub-part, we have
Part 3: Note that for 1 ≤ i ≤ n,
So, z - w ∈ LS(S)⊥. Hence, ⟨z - w,w⟩ = 0 as w ∈ LS(S). Further,
Part 4: Follows directly using Part 2b as {u1,…,un} is a basis of V. _
A rephrasing of Theorem 5.2.5.2b gives a generalization of the pythagoras theorem, popularly
known as the Parseval’s formula. The proof is left as an exercise for the reader.
DRAFT
Theorem 5.2.6. Let V be a finite dimensional ips with an orthonormal basis {v1,,vn}.
Then, for each x,y ∈ V,
Furthermore, if x =
y then ∥x∥2 =
∑
i=1n|⟨x,vi⟩|2 (generalizing the Pythagoras Theorem).
As a corollary to Theorem 5.2.5, we have the following result.
Proof. For 1 ≤ k ≤ n, define uk = and use Theorem 5.2.5.4 to get the required result. _
In view of the importance of Theorem 5.2.5, we inquire into the question of extracting an
orthonormal basis from a given basis. The process of extracting an orthonormal basis from a finite
linearly independent set is called the Gram-Schmidt Orthonormalization process. We first
consider a few examples. Note that Theorem 5.2.5 also gives us an algorithm for doing so, i.e., from
the given vector subtract all the orthogonal projections/components. If the new vector is nonzero then
this vector is orthogonal to the previous ones. The proof follows directly from Theorem 5.2.5 but we
give it again for the sake of completeness.
Theorem 5.2.10 (Gram-Schmidt Orthogonalization Process). Let V be an ips. If {v1,…,vn}
is a set of linearly independent vectors in V then there exists an orthonormal set {w1,…,wn}
in V. Furthermore, LS(w1,…,wi) = LS(v1,…,vi), for 1 ≤ i ≤ n.
Proof. Note that for orthonormality, we need ∥wi∥ = 1, for 1 ≤ i ≤ n and ⟨wi,wj⟩ = 0, for
1 ≤ i≠j ≤ n. Also, by Corollary 3.3.8.2, viLS(v1,…,vi-1), for 2 ≤ i ≤ n, as {v1,…,vn} is a linearly
independent set. We are now ready to prove the result by induction.
Step 1: Define w1 = then LS(v1) = LS(w1).
Step 2: Define u2 = v2 -⟨v2,w1⟩w1. Then, u2≠0 as v2 ⁄∈ LS(v1). So, let w2 = .
Note that {w1,w2} is orthonormal and LS(w1,w2) = LS(v1,v2).
Step 3: For induction, assume that we have obtained an orthonormal set {w1,…,wk-1} such that
LS(v1,…,vk-1) = LS(w1,…,wk-1). Now, note that
uk = vk -∑
i=1k-1⟨vk,wi⟩wi = vk -∑
i=1k-1Projwi(vk)≠0 as vkLS(v1,…,vk-1). So, let us put
wk = . Then, {w1,…,wk} is orthonormal as ∥wk∥ = 1 and
Similarly, ⟨wk,wi⟩ = 0, for 2 ≤ i ≤ k - 1. Clearly, wk = uk∕∥uk∥∈ LS(w1,…,wk-1,vk). So,
wk ∈ LS(v1,…,vk).
As vk = ∥uk∥wk + ∑
i=1k-1⟨vk,wi⟩wi, we get vk ∈ LS(w1,…,wk). Hence, by the
principle of mathematical induction LS(w1,…,wk) = LS(v1,…,vk) and the required result
follows. _
We now illustrate the Gram-Schmidt process with a few examples.
Example 5.2.11.
-
1.
- Let S = {(1,-1,1,1),(1,0,1,0),(0,1,0,1)}⊆ ℝ4. Find an orthonormal set T such that
LS(S) = LS(T).
Solution: Let v1 = (1,0,1,0)T ,v2 = (0,1,0,1)T and v3 = (1,-1,1,1)T . Then,
w1 = (1,0,1,0)T . As ⟨v2,w1⟩ = 0, we get w2 = (0,1,0,1)T . For the third vector,
let u3 = v3 -⟨v3,w1⟩w1 -⟨v3,w2⟩w2 = (0,-1,0,1)T . Thus, w3 = (0,-1,0,1)T .
-
2.
- Let S = {v1 = T ,v2 = T ,v3 = T ,v4 = T }. Find
an orthonormal set T such that LS(S) = LS(T).
Solution: Take w1 = = T = e1. For the second vector, consider
u2 = v2 -w1 = T . So, put w2 = = T = e2.
For the third vector, let u3 = v3 -∑
i=12⟨v3,wi⟩wi = (0,0,0)T . So, v3 ∈ LS((w1,w2)).
Or equivalently, the set {v1,v2,v3} is linearly dependent.
So, for again computing the third vector, define u4 = v4 -∑
i=12⟨v4,wi⟩wi. Then,
u4 = v4 - w1 - w2 = e3. So w4 = e3. Hence, T = {w1,w2,w4} = {e1,e2,e3}.
-
3.
- Find an orthonormal set in ℝ3 containing (1,2,1)T .
Solution: Let (x,y,z)T ∈ ℝ3 with (1,2,1),(x,y,z) = 0. Thus,
Observe that (-2,1,0) and (-1,0,1) are orthogonal to (1,2,1) but are themselves not
orthogonal.
Method 1: Apply Gram-Schmidt process to {(1,2,1)T ,(-2,1,0)T ,(-1,0,1)T }⊆ ℝ3.
Method 2: Valid only in ℝ3 using the cross product of two vectors.
In either case, verify that {(1,2,1),(2,-1,0),(1,2,-5)} is the required set.
We now state two immediate corollaries without proof.
Corollary 5.2.12. Let V≠{0} be an ips. If
-
1.
- V is finite dimensional then V has an orthonormal basis.
-
2.
- S is a non-empty orthonormal set and dim(V) is finite then S can be extended to form an
orthonormal basis of V.
Remark 5.2.13. Let S = {v1,…,vn}≠{0} be a non-empty subset of a finite dimensional vector space
V. If we apply Gram-Schmidt process to
-
1.
- S then we obtain an orthonormal basis of LS(v1,…,vn).
DRAFT
-
2.
- a re-arrangement of elements of S then we may obtain another orthonormal basis of
LS(v1,…,vn). But, observe that the size of the two bases will be the same.
Exercise 5.2.14.
-
1.
- Let V be an ips with = {v1,…,vn} as a basis. Then, prove that is orthonormal if and
only if for each x ∈ V, x = ∑
i=1n⟨x,vi⟩vi. [Hint: Since is a basis, each x ∈ V has a
unique linear combination in terms of vi’s.]
-
2.
- Let S be a subset of V having 101 elements. Suppose that the application of the
Gram-Schmidt process yields u5 = 0. Does it imply that LS(v1,…,v5) = LS(v1,…,v4)?
Give reasons for your answer.
-
3.
- Let = {v1,…,vn} be an orthonormal set in ℝn. For 1 ≤ k ≤ n, define Ak =
∑
i=1kviviT . Then, prove that AkT = Ak and Ak2 = Ak. Thus, Ak’s are projection
matrices.
-
4.
- Determine an orthonormal basis of ℝ4 containing (1,-2,1,3)T and (2,1,-3,1)T .
-
5.
- Let x ∈ ℝn with ∥x∥ = 1.
-
(a)
- Then, prove that {x} can be extended to form an orthonormal basis of ℝn.
-
(b)
- Let the extended basis be {x,x2,…,xn} and = [e1,…,en] the standard ordered basis
of ℝn. Prove that A = [x],[x2],…,[xn] is an orthogonal matrix.
-
6.
- Let v,w ∈ ℝn,n ≥ 1 with ∥u∥ = ∥w∥ = 1. Prove that there exists an orthogonal matrix A such
that Av = w. Prove also that A can be chosen such that det(A) = 1.
-
7.
- Let (V,⟨,⟩) be an n-dimensional ips. If u ∈ V with ∥u∥ = 1 then give reasons for the following
statements.
-
(a)
- Let S⊥ = {v ∈ V|⟨v,u⟩ = 0}. Then, dim(S⊥) = n - 1.
DRAFT
-
(b)
- Let 0≠β ∈ F. Then, S = {v ∈ V : ⟨v,u⟩ = β} is not a subspace of V.
-
(c)
- Let v ∈ V. Then, v = v0 + ⟨v,u⟩u for a vector v0 ∈ S⊥. That is, V = LS(u,S⊥).
We end this section by proving the fundamental theorem of linear algebra. So, the readers are advised
to recall the four fundamental subspaces and also to go through Theorem 3.5.9 (the rank-nullity
theorem for matrices). We start with the following result.
Lemma 5.2.15. Let A ∈ Mm,n(ℝ). Then, Null(A) = Null(AT A).
Proof. Let x ∈ Null(A). Then, Ax = 0. So, (AT A)x = AT (Ax) = AT 0 = 0. Thus, x ∈ Null(AT A).
That is, Null(A) ⊆ Null(AT A).
Suppose that x ∈ Null(AT A). Then, (AT A)x = 0 and 0 = xT 0 = xT (AT A)x = (Ax)T (Ax) = ∥Ax∥2.
Thus, Ax = 0 and the required result follows. _
Proof. Part 1: Proved in Theorem 3.5.9.
Part 2: We first prove that Null(A) ⊆ Col(A*)⊥. Let x ∈ Null(A). Then, Ax = 0
and
DRAFT
But
Col(A*) = {A*u|u ∈ ℂn}. Thus, x ∈ Col(A*)⊥ and Null(A) ⊆ Col(A*)⊥.
We now prove that Col(A*)⊥⊆ Null(A). Let x ∈ Col(A*)⊥. Then, for every y ∈ ℂn,
In
particular, for y = Ax ∈ ℂn, we get ∥Ax∥2 = 0. Hence Ax = 0. That is, x ∈ Null(A). Thus, the proof
of the first equality in Part 2 is over. We omit the second equality as it proceeds on the same lines as
above.
Part 3: Use the first two parts to get the required result.
Hence the proof of the fundamental theorem is complete. _
Remark 5.2.17. Theorem 5.2.16.2 implies that Null(A) = Col(A*)⊥. This statement is
just stating the usual fact that if x ∈ Null(A) then Ax = 0 and hence the usual dot product of
every row of A with x equals 0.
As an implication of Theorem 5.2.16.2 and Theorem 5.2.16.3, we show the existence of an
invertible linear map T : Col(A*) → Col(A).
Corollary 5.2.18. Let A ∈ Mn(ℂ). Then, the function T : Col(A*) → Col(A) defined by
T(x) = Ax is invertible.
DRAFT
Proof. In view of Theorem 5.2.16.3 and the rank-nullity theorem, we just need to show that the
map is one-one. So, suppose that there exist x,y ∈ Col(A*) such that T(x) = T(y). Or
equivalently, Ax = Ay. Thus, x - y ∈ Null(A) = (Col(A*))⊥ (by Theorem 5.2.16.2). Therefore,
x-y ∈ (Col(A*))⊥∩Col(A*) = {0}. Thus, x = y and hence the map is one-one. Thus, the required
result follows. _
The readers should look at Example 3.2.3 and Remark 3.2.4. We give one more example.
Example 5.2.19. Let A = . Then, verify that
-
1.
- {(0,1,1)T ,(1,1,2)T } is a basis of Col(A).
-
2.
- {(1,1,-1)T } is a basis of Null(AT ).
-
3.
- Null(AT ) = (Col(A))⊥.
Exercise 5.2.20.
-
1.
- Find distinct subspaces W1 and W2
-
(a)
- in ℝ2 such that W1 and W2 are orthogonal but not orthogonal complement.
-
(b)
- in ℝ3 such that W1≠{0} and W2≠{0} are orthogonal, but not orthogonal complement.
-
2.
- Let A ∈ Mm,n(ℂ). Then, Null(A) = Null(A*A).
-
3.
- Let A ∈ Mm,n(ℝ). Then, Col(A) = Col(AT A).
-
4.
- Let A ∈ Mm,n(ℝ). Then, Rank(A) = n if and only if Rank(AT A) = n.
DRAFT
-
5.
- Let A ∈ Mm,n(ℂ). Then, for every
-
(a)
- x ∈ ℝn, x = u + v, where u ∈ Col(AT ) and v ∈ Null(A) are unique.
-
(b)
- y ∈ ℝm, y = w + z, where w ∈ Col(A) and z ∈ Null(AT ) are unique.
For more information related with the fundamental theorem of linear algebra the interested readers
are advised to see the article “The Fundamental Theorem of Linear Algebra, Gilbert Strang, The
American Mathematical Monthly, Vol. 100, No. 9, Nov., 1993, pp. 848 - 855.”
The next result gives the proof of the QR decomposition for real matrices. The readers are advised
to prove similar results for matrices with complex entries. This decomposition and its
generalizations are helpful in the numerical calculations related with eigenvalue problems (see
Chapter 6).
Theorem 5.2.1 (QR Decomposition). Let A ∈ Mn(ℝ) be invertible. Then, there exist
matrices Q and R such that Q is orthogonal and R is upper triangular with A = QR.
Furthermore, if det(A)≠0 then the diagonal entries of R can be chosen to be positive. Also, in
this case, the decomposition is unique.
Proof. As A is invertible, it’s columns form a basis of ℝn. So, an application of the Gram-Schmidt
orthonormalization process to {A[:,1],…,A[:,n]} gives an orthonormal basis {v1,…,vn} of ℝn
satisfying
DRAFT
Since A[:,i] ∈ LS(v1,…,vi), for 1 ≤ i ≤ n, there exist αji ∈ ℝ,1 ≤ j ≤ i, such that
A[:,i] = [v1,…,vi]. Thus, if Q = [v1,…,vn] and R = then
-
1.
- Q is an orthogonal matrix (see Exercise 5.4.8.5),
-
2.
- R is an upper triangular matrix, and
-
3.
- A = QR.
Thus, this completes the proof of the first part. Note that
-
1.
- αii≠0, for 1 ≤ i ≤ n, as A[:,1]≠0 and A[:,i]LS(v1,…,vi-1).
-
2.
- if αii < 0, for some i,1 ≤ i ≤ n then we can replace vi in Q by -vi to get a new Q ad R
in which the diagonal entries of R are positive.
Uniqueness: Suppose Q1R1 = Q2R2 for some orthogonal matrices Qi’s and upper triangular matrices
Ri’s with positive diagonal entries. As Qi’s and Ri’s are invertible, we get Q2-1Q1 = R2R1-1. Now,
using
-
1.
- Exercises 2.3.25.1, 1.2.15.1, the matrix R2R1-1 is an upper triangular matrix.
-
2.
- Exercises 1.3.2.3, Q2-1Q1 is an orthogonal matrix.
So, the matrix R2R1-1 is an orthogonal upper triangular matrix and hence, by Exercise 1.2.11.4,
R2R1-1 = In. So, R2 = R1 and therefore Q2 = Q1. _
Let A be an n × k matrix with Rank(A) = r. Then, by Remark 5.2.13, an application of the
Gram-Schmidt orthonormalization process to columns of A yields an orthonormal set {v1,…,vr}⊆ ℝn
such that
Hence, proceeding on the lines of the above theorem, we have the following result.
DRAFT
Theorem 5.2.2 (Generalized QR Decomposition). Let A be an n × k matrix of rank r. Then,
A = QR, where
-
1.
- Q = [v1,…,vr] is an n × r matrix with QT Q = Ir,
-
2.
- LS(A[:,1],…,A[:,j]) = LS(v1,…,vi), for 1 ≤ i ≤ j ≤ k and
-
3.
- R is an r × k matrix with Rank(R) = r.
Example 5.2.3.
-
1.
- Let A = . Find an orthogonal matrix Q and an upper triangular matrix
R such that A = QR.
Solution: From Example 5.2.11, we know that w1 = (1,0,1,0)T , w2 = (0,1,0,1)T
and w3 = (0,-1,0,1)T . We now compute w4. If v4 = (2,1,1,1)T then
Thus, w4 = (-1,0,1,0)T . Hence, we see that A = QR with
Q = w1,…,w4 = and R = .
-
2.
- Let A = . Find a 4 × 3 matrix Q satisfying QT Q = I3 and an upper
triangular matrix R such that A = QR.
Solution: Let us apply the Gram-Schmidt orthonormalization process to the columns of
A. As v1 = (1,-1,1,1)T , we get w1 = v1. Let v2 = (1,0,1,0)T . Then,
Hence, w2 = (1,1,1,-1)T . Let v3 = (1,-2,1,2)T . Then,
So, we again take v3 = (0,1,0,1)T . Then,
So, w3 = (0,1,0,1)T . Hence,
DRAFT
The readers are advised to check the following:
-
(a)
- Rank(A) = 3,
-
(b)
- A = QR with QT Q = I3, and
-
(c)
- R is a 3 × 4 upper triangular matrix with Rank(R) = 3.
Remark 5.2.4. Let A ∈ Mm,n(ℝ).
-
1.
- If A = QR with Q = [v1,…,vn] then R = .
In case Rank(A) < n then a slight modification gives the matrix R.
-
2.
- Further, let Rank(A) = n.
-
(a)
- Then, AT A is invertible (see Exercise 5.2.20.4).
-
(b)
- By Theorem 5.2.2, A = QR with Q a matrix of size m×n and R an upper triangular
matrix of size n × n. Also, QT Q = In and Rank(R) = n.
DRAFT
-
(c)
- Thus, AT A = RT QT QR = RT R. As AT A is invertible, the matrix RT R is invertible.
Since R is a square matrix, by Exercise 2.3.5.1, the matrix R itself is invertible.
Hence, (RT R)-1 = R-1(RT )-1.
-
(d)
- So, if Q = [v1,…,vn] then
-
(e)
- Hence, using Theorem 5.3.7, we see that the matrix
is the orthogonal projection matrix on Col(A).
-
3.
- Further, let Rank(A) = r < n. If j1,…,jr are the pivot columns of A then Col(A) = Col(B),
where B = [A[:,j1],…,A[:,jr]] is an m × r matrix with Rank(B) = r. So, using Part 2e we see
that B(BT B)-1BT is the orthogonal projection matrix on Col(A). So, compute RREF of A and
choose columns of A corresponding to the pivot columns.
DRAFT
Till now, our main interest was to understand the linear system Ax = b, for A ∈ Mm,n(ℂ),x ∈ ℂn and
b ∈ ℂm, from different view points. But, in most practical situations the system has no solution. So,
we are interested in finding a point x0 ∈ ℝn such that the err = b - Ax0 is the least. Thus, we
consider the problem of finding x0 ∈ ℝn such that
i.e.,
we try to find the vector x0 ∈ ℝn which is nearest to b.
To begin with, recall the following result from Page .
Theorem 5.3.1 (Decomposition). Let V be an ips having W as a finite dimensional subspace.
Suppose {f1,…,fk} is an orthonormal basis of W. Then, for each b ∈ V, y = ∑
i=1k⟨b,fi⟩fi is
the closest point in W from b. Furthermore, b - y ∈ W⊥.
We now give a definition and then an implication of Theorem 5.3.1.
Definition 5.3.2. [Orthogonal Projection] Let W be a finite dimensional subspace of an ips
V. Then, by Theorem 5.3.1, for each v ∈ V there exist unique vectors w ∈ W and u ∈ W⊥
with v = w + u. We thus define the orthogonal projection of V onto W, denoted PW, by
The vector
w is called the
projection of
v on
W.
Remark 5.3.3. Let A ∈ Mm,n(ℝ) and W = Col(A). Then, to find the orthogonal projection PW(b),
we can use either of the following ideas:
-
1.
- Determine an orthonormal basis {f1,…,fk} of Col(A) and get PW(b) = ∑
i=1k⟨b,fi⟩fi.
-
2.
- By Theorem 5.2.16.2, Col(A) = Null(AT )⊥. Hence, for b ∈ ℝm there exists unique
u ∈ Col(A) and v ∈ Null(AT ) such that b = u + v. Thus, using Definition 5.3.2 and
Theorem 5.3.1, PW(b) = u.
Before proceeding to projections, we give an application of Theorem 5.3.1 to a linear
system.
Corollary 5.3.4. Let A ∈ Mm,n(ℝ) and b ∈ ℝm. Then, every least square solution of Ax = b
is a solution of the system AT Ax = AT b. Conversely, every solution of AT Ax = AT b is a least
square solution of Ax = b.
Proof. As b ∈ ℝm, by Remark 5.3.3, there exists y ∈ Col(A) and v ∈ Null(AT ) such that b = y + v
and min{∥b-w∥|w ∈ Col(A)} = ∥b-y∥. As y ∈ Col(A), there exists x0 ∈ ℝn such that Ax0 = y,
i.e., x0 is the least square solution of Ax = b. Hence,
DRAFT
Conversely, let x1 ∈ ℝn be a solution of AT Ax = AT b, i.e., AT = 0. To
show
Note that AT (Ax1 - b) = 0 implies 0 = (x - x1)T AT (Ax1 - b) = (Ax - Ax1)T (Ax1 - b) and
hence ⟨b - Ax1,A(x - x1)⟩ = 0. Thus,
Hence, the required result follows. _
The above corollary gives the following result.
Corollary 5.3.5. Let A ∈ Mm,n(ℝ) and b ∈ ℝm. If
-
1.
- AT A is invertible then the least square solution of Ax = b equals x = (AT A)-1AT b.
-
2.
- AT A is not invertible then the least square solution of Ax = b equals x = (AT A)-AT b,
where (AT A)- is the pseudo-inverse of AT A.
Proof. Part 1 directly follows from Corollary 5.3.5. For Part 2, let b = y + v, for y ∈ Col(A)
and v ∈ Null(AT ). As y ∈ Col(A), there exists x0 ∈ ℝn such that Ax0 = y. Thus, by
Remark 5.3.3, AT b = AT (y + v) = AT y = AT Ax0. Now, using the definition of pseudo-inverse (see
Exercise 1.3.7.14), we see that
DRAFT
Thus, we see that (AT A)-AT b is a solution of the system AT Ax = AT b. Hence, by Corollary 5.3.4,
the required result follows. _
We now give a few examples to understand projections.
Example 5.3.6. Use the fundamental theorem of linear algebra to compute the vector of the
orthogonal projection.
-
1.
- Determine the projection of (1,1,1,1,1)T on Null.
Solution: Here A = [1,-1,1,-1,1]. So, a basis of Col(AT ) equals {(1,-1,1,-1,1)T }
and that of Null(A) equals {(1,1,0,0,0)T ,(1,0,-1,0,0)T ,(1,0,0,1,0)T ,(1,0,0,0,-1)T }.
Then, the solution of the linear system
Bx = , where B = equals x = . Thus, the
projection is =
(2,3,2,3,2)T .
-
2.
- Determine the projection of (1,1,1)T on Null.
Solution: Here A = [1,1,-1]. So, a basis of Null(A) equals {(1,-1,0)T ,(1,0,1)T } and
that of Col(AT ) equals {(1,1,-1)T }. Then, the solution of the linear system
Bx = , where B = equals x = . Thus, the projection is
= (1,1,2)T .
-
3.
- Determine the projection of (1,1,1)T on Col.
Solution: Here, AT = [1,2,1], a basis of Col(A) equals {(1,2,1)T } and that of Null(AT )
equals {(1,0,-1)T ,(2,-1,0)T }. Then, using the solution of the linear system
Bx = , where B = gives (1,2,1)T as the required vector.
To use the first idea in Remark 5.3.3, we prove the following result which helps us to get the
matrix of the orthogonal projection from an orthonormal basis.
Theorem 5.3.7. Let {f1,…,fk} be an orthonormal basis of a finite dimensional subspace W
of an ips V. Then PW = ∑
i=1kfifi*.
Proof. Let v ∈ V. Then,
As
PWv is the only closet point (see Theorem 5.3.1), the required result follows. _
Example 5.3.8. In each of the following, determine the matrix of the orthogonal projection. Also,
verify that PW + PW⊥ = I. What can you say about Rank(PW⊥) and Rank(PW)? Also, verify the
orthogonal projection vectors obtained in Example 5.3.6.
-
1.
- W = {(x1,…,x5)T ∈ ℝ5|x1 - x2 + x3 - x4 + x5 = 0} = Null.
Solution: An orthonormal basis of W is . Thus,
PW = ∑
i=14fifiT = and
PW⊥ = .
-
2.
- W = {(x,y,z)T ∈ ℝ3|x + y - z = 0} = Null.
Solution: Note {(1,1,-1)} is a basis of W⊥ and an
orthonormal basis of W. So,
Verify that PW + PW⊥ = I3, Rank(PW⊥) = 2 and Rank(PW) = 1.
-
3.
- W = LS((1,2,1)) = Col⊆ ℝ3.
Solution: Using Example 5.2.11.3 and Equation (5.2.1)
So, PW = and P
W⊥ = .
DRAFT
We advise the readers to give a proof of the next result.
Theorem 5.3.9. Let {f1,…,fk} be an orthonormal basis of a subspace W of ℝn. If {f1,…,fn} is an
extended orthonormal basis of ℝn, PW = ∑
i=1kfifiT and PW⊥ = ∑
i=k+1nfifiT then prove that
-
1.
- In - PW = PW⊥.
-
2.
- (PW)T = PW and (PW⊥)T = PW⊥. That is, PW and PW⊥ are symmetric.
-
3.
- (PW)2 = PW and (PW⊥)2 = PW⊥. That is, PW and PW⊥ are idempotent.
-
4.
- PW ∘ PW⊥ = PW⊥∘ PW = 0.
Exercise 5.3.10.
-
1.
- Let W = {(x,y,z,w) ∈ ℝ4 : x = y,z = w} be a subspace of ℝ4. Determine the matrix of
the orthogonal projection.
-
2.
- Let PW1 and PW2 be the orthogonal projections of ℝ2 onto W1 = {(x,0) : x ∈ ℝ} and
W2 = {(x,x) : x ∈ ℝ}, respectively. Note that PW1 ∘PW2 is a projection onto W1. But, it
is not an orthogonal projection. Hence or otherwise, conclude that the composition of two
orthogonal projections need not be an orthogonal projection?
-
3.
- Let A = . Then, A is idempotent but not symmetric. Now, define P : ℝ2 → ℝ2 by
P(v) = Av, for all v ∈ ℝ2. Then,
-
(a)
- P is idempotent.
-
(b)
- Null(P) ∩ Rng(P) = Null(A) ∩ Col(A) = {0}.
DRAFT
-
(c)
- ℝ2 = Null(P) + Rng(P). But, (Rng(P))⊥ = (Col(A))⊥≠Null(A).
-
(d)
- Since (Col(A))⊥≠Null(A), the map P is not an orthogonal projector. In this case,
P is called a projection of ℝ2 onto Rng(P) along Null(P).
-
4.
- Find all 2 × 2 real matrices A such that A2 = A. Hence, or otherwise, determine all projection
operators of ℝ2.
-
5.
- Let W be an (n - 1)-dimensional subspace of ℝn with ordered basis W = [f1,…,fn-1].
Suppose = [f1,…,fn-1,fn] is an orthogonal ordered basis of ℝn obtained by extending
W. Now, define a function Q : ℝn → ℝn by Q(v) = ⟨v,fn⟩fn -∑
i=1n-1⟨v,fi⟩fi.
Then,
-
(a)
- Q fixes every vector in W⊥.
-
(b)
- Q sends every vector w ∈ W to -w.
-
(c)
- Q ∘ Q = In.
The function Q is called the reflection operator with respect to W⊥.
Theorem 5.3.9 implies that the matrix of the projection operator is symmetric. We use this idea to
proceed further.
Definition 5.3.11. [Self-Adjoint Operator] Let V be an ips with inner product ⟨,⟩. A linear
operator P : V → V is called self-adjoint if ⟨P(v),u⟩ = ⟨v,P(u)⟩, for every u,v ∈ V.
A careful understanding of the examples given below shows that self-adjoint operators and
Hermitian matrices are related. It also shows that the vector spaces ℂn and ℝn can be decomposed in
terms of the null space and column space of Hermitian matrices. They also follow directly from the
fundamental theorem of linear algebra.
DRAFT
Example 5.3.12.
-
1.
- Let A be an n × n real symmetric matrix. If P : ℝn → ℝn is defined by P(x) = Ax, for every
x ∈ ℝn then
-
(a)
- P is a self adjoint operator as A = AT , for every x,y ∈ ℝn, implies
-
(b)
- Null(P) = (Rng(P))⊥ as A = AT . Thus, ℝn = Null(P) ⊕ Rng(P).
-
2.
- Let A be an n × n Hermitian matrix. If P : ℂn → ℂn is defined by P(z) = Az, for all z ∈ ℂn
then using similar arguments (see Example 5.3.12.1) prove the following:
-
(a)
- P is a self-adjoint operator.
-
(b)
- Null(P) = (Rng(P))⊥ as A = A*. Thus, ℂn = Null(P) ⊕ Rng(P).
We now state and prove the main result related with orthogonal projection operators.
Theorem 5.3.13. Let V be a finite dimensional ips. If V = W ⊕ W⊥ then the orthogonal projectors
PW : V → V on W and PW⊥ : V → V on W⊥ satisfy
DRAFT
-
1.
- Null(PW) = {v ∈ V : PW(v) = 0} = W⊥ = Rng(PW⊥).
-
2.
- Rng(PW) = {PW(v) : v ∈ V} = W = Null(PW⊥).
-
3.
- PW ∘ PW = PW, PW⊥∘ PW⊥ = PW⊥ (Idempotent).
-
4.
- PW⊥∘ PW = 0V and PW ∘ PW⊥ = 0V, where 0V(v) = 0, for all v ∈ V
-
5.
- PW + PW⊥ = IV, where IV(v) = v, for all v ∈ V.
-
6.
- The operators PW and PW⊥ are self-adjoint.
Proof. Part 1: As V = W ⊕ W⊥, for each u ∈ W⊥, one uniquely writes u = 0 + u, where 0 ∈ W and
u ∈ W⊥. Hence, by definition, PW(u) = 0 and PW⊥(u) = u. Thus, W⊥ ⊆ Null(PW) and
W⊥⊆ Rng(PW⊥).
Now suppose that v ∈ Null(PW). So, PW(v) = 0. As V = W ⊕ W⊥, v = w + u, for unique
w ∈ W and unique u ∈ W⊥. So, by definition, PW(v) = w. Thus, w = PW(v) = 0. That is,
v = 0 + u = u ∈ W⊥. Thus, Null(PW) ⊆ W⊥.
A similar argument implies Rng(PW⊥) ⊆ W⊥ and thus completing the proof of the first
part.
Part 2: Use an argument similar to the proof of Part 1.
Part 3, Part 4 and Part 5: Let v ∈ V. Then, v = w + u, for unique w ∈ W and unique
u ∈ W⊥. Thus, by definition,
Hence, PW ∘ PW = PW, PW⊥∘ PW = 0V and IV = PW ⊕ PW⊥.
Part 6: Let u = w1 + x1 and v = w2 + x2, for unique w1,w2 ∈ W and unique x1,x2 ∈ W⊥.
Then, by definition, ⟨wi,xj⟩ = 0 for 1 ≤ i,j ≤ 2. Thus,
DRAFT
and
the proof of the theorem is complete. _
The next theorem is a generalization of Theorem 5.3.13. We omit the proof as the arguments are
similar and uses the following:
Let V be a finite dimensional ips with V = W1 ⊕⊕ Wk, for certain subspaces Wi’s of V. Then, for
each v ∈ V there exist unique vectors v1,…,vk such that
-
1.
- vi ∈ Wi, for 1 ≤ i ≤ k,
-
2.
- ⟨vi,vj⟩ = 0 for each vi ∈ Wi,vj ∈ Wj,1 ≤ i≠j ≤ k and
-
3.
- v = v1 + + vk.
We now give the definition and a few properties of an orthogonal operator.
Definition 5.4.1. [Orthogonal Operator] Let V be a vector space. Then, a linear operator
T : V → V is said to be an orthogonal operator if ∥T(x)∥ = ∥x∥, for all x ∈ V.
Example 5.4.2. Each T ∈(V) given below is an orthogonal operator.
-
1.
- Fix a unit vector a ∈ V and define T(x) = 2⟨x,a⟩a - x, for all x ∈ V.
Solution: Note that Proja(x) = ⟨x,a⟩a. So, ⟨x,a⟩a,x - ⟨x,a⟩a = 0. Also, by
Pythagoras theorem ∥x -⟨x,a⟩a∥2 = ∥x∥2 - (⟨x,a⟩)2. Thus,
DRAFT
-
2.
- Let n = 2, V = ℝ2 and 0 ≤ θ < 2π. Now define T(x) = .
We now show that an operator is orthogonal if and only if it preserves the angle.
Theorem 5.4.3. Let T ∈(V). Then, the following statements are equivalent.
-
1.
- T is an orthogonal operator.
-
2.
- ⟨T(x),T(y)⟩ = ⟨x,y⟩, for all x,y ∈ V. That is, T preserves inner product.
Proof. 1 ⇒2 Let T be an orthogonal operator. Then, ∥T(x + y)∥2 = ∥x + y∥2. So,
∥T(x)∥2 + ∥T(y)∥2 + 2⟨T(x),T(y)⟩ = ∥T(x) + T(y)∥2 = ∥T(x + y)∥2 = ∥x∥2 + ∥y∥2 + 2⟨x,y⟩. Thus,
using definition again ⟨T(x),T(y)⟩ = ⟨x,y⟩.
2 ⇒1 If ⟨T(x),T(y)⟩ = ⟨x,y⟩, for all x,y ∈ V then T is an orthogonal operator as
∥T(x)∥2 = ⟨T(x),T(x)⟩ = ⟨x,x⟩ = ∥x∥2. _
As an immediate corollary, we obtain the following result.
Definition 5.4.5. [Isometry, Rigid Motion] Let V be a vector space. Then, a map T : V → V
is said to be an isometry or a rigid motion if ∥T(x) - T(y)∥ = ∥x - y∥, for all x,y ∈ V.
That is, an isometry is distance preserving.
DRAFT
Observe that if T and S are two rigid motions then ST is also a rigid motion. Furthermore, it is
clear from the definition that every rigid motion is invertible.
Example 5.4.6. The maps given below are rigid motions/isometry.
-
1.
- Let V be a linear space with norm ∥⋅∥. If a ∈ V then the translation map Ta : V → V
(see Exercise 7), defined by Ta(x) = x + a for all x ∈ V, is an isometry/rigid motion as
-
2.
- Let V be an ips. Then, using Theorem 5.4.3, we see that every orthogonal operator is an
isometry.
We now prove that every rigid motion that fixes origin is an orthogonal operator.
Theorem 5.4.7. Let V be a real ips. Then, the following statements are equivalent for any map
T : V → V.
-
1.
- T is a rigid motion that fixes origin.
-
2.
- T is linear and ⟨T(x),T(y)⟩ = ⟨x,y⟩, for all x,y ∈ V (preserves inner product).
-
3.
- T is an orthogonal operator.
DRAFT
Proof. We have already seen the equivalence of Part 2 and Part 3 in Theorem 5.4.3. Let us now prove
the equivalence of Part 1 and Part 2/Part 3.
If T is an orthogonal operator then T(0) = 0 and ∥T(x) - T(y)∥ = ∥T(x - y)∥ = ∥x - y∥. This
proves Part 3 implies Part 1.
We now prove Part 1 implies Part 2. So, let T be a rigid motion that fixes 0. Thus, T(0) = 0 and
∥T(x) -T(y)∥ = ∥x - y∥, for all x,y ∈ V. Hence, in particular for y = 0, we have ∥T(x)∥ = ∥x∥, for
all x ∈ V. So,
Thus, using ∥T(x)∥ = ∥x∥, for all x ∈ V, we get ⟨T(x),T(y)⟩ = ⟨x,y⟩, for all x,y ∈ V. Now, to prove
T is linear, we use ⟨T(x),T(y)⟩ = ⟨x,y⟩ in 3-rd and 4-th line to get
Thus, T(x + y) - = 0 and hence T(x + y) = T(x) + T(y). A similar calculation
gives T(αx) = αT(x) and hence T is linear. _
DRAFT
Exercise 5.4.8.
-
1.
- Let A,B ∈ Mn(ℂ). Then, A and B are said to be
-
(a)
- Orthogonally Congruent if B = ST AS, for some invertible matrix S.
-
(b)
- Unitarily Congruent if B = S*AS, for some invertible matrix S.
Prove that Orthogonal and Unitary congruences are equivalence relations on Mn(ℝ) and Mn(ℂ),
respectively.
-
2.
- Let x ∈ ℂ2. Identify it with the complex number x = x1 + ix2. If we rotate x by a
counterclockwise rotation θ,0 ≤ θ < 2π then, we have
Thus, the corresponding vector in ℝ2 is
Is the matrix, , the matrix of the corresponding rotation? Justify.
DRAFT
-
3.
- Let A ∈ M2(ℝ) and T(θ) = , for θ ∈ ℝ. Then, A is an orthogonal matrix if and
only if A = T(θ) or A = T(θ), for some θ ∈ ℝ.
-
4.
- Let A ∈ Mn(ℂ). Then, the following statements are equivalent.
-
(a)
- A is an orthogonal matrix.
-
(b)
- A-1 = AT .
-
(c)
- AT is orthogonal.
-
(d)
- the columns of A form an orthonormal basis of the real vector space ℝn.
-
(e)
- the rows of A form an orthonormal basis of the real vector space ℝn.
-
(f)
- for any two vectors x,y ∈ ℂn, ⟨Ax,Ay⟩ = ⟨x,y⟩ Orthogonal matrices
preserve angle.
-
(g)
- for any vector x ∈ ℂn, ∥Ax∥ = ∥x∥ Orthogonal matrices preserve length.
-
5.
- Let U be an n × n matrix. Then, prove that the following statements are equivalent.
-
(a)
- U is a unitary matrix.
-
(b)
- U-1 = U*.
-
(c)
- U* is unitary.
-
(d)
- the columns of U form an orthonormal basis of the complex vector space ℂn.
-
(e)
- the rows of U form an orthonormal basis of the complex vector space ℂn.
-
(f)
- for any two vectors x,y ∈ ℂn, ⟨Ux,Uy⟩ = ⟨x,y⟩ Unitary matrices preserve
angle.
-
(g)
- for any vector x ∈ ℂn, ∥Ux∥ = ∥x∥ Unitary matrices preserve length.
DRAFT
-
6.
- Let A be an n × n orthogonal matrix. Then, prove that det(A) = ±1.
-
7.
- Let A be an n × n upper triangular matrix. If A is also an orthogonal matrix then A is a
diagonal matrix with diagonal entries ±1.
-
8.
- Prove that in M5(ℝ), there are infinitely many orthogonal matrices of which only finitely many
are diagonal (in fact, there number is just 32).
-
9.
- Prove that permutation matrices are real orthogonal.
-
10.
- Let A,B ∈ Mn(ℂ) be two unitary matrices. Then, prove that AB and BA are unitary
matrices.
-
11.
- If A = [aij] and B = [bij] are unitarily equivalent then prove that ∑
ij|aij|2 = ∑
ij|bij|2.
-
12.
- Let U be a unitary matrix and for every x ∈ ℂn, define
Then, is it necessary that ∥Ux∥1 = ∥x∥1?
In the previous chapter, we learnt that if V is vector space over F with dim(V) = n then V basically
looks like Fn. Also, any subspace of Fn is either Col(A) or Null(A) or both, for some matrix A with
entries from F.
So, we started this chapter with inner product, a generalization of the dot product in ℝ3 or ℝn. We
used the inner product to define the length/norm of a vector. The norm has the property that “the
norm of a vector is zero if and only if the vector itself is the zero vector”. We then proved the
Cauchy-Bunyakovskii-Schwartz Inequality which helped us in defining the angle between two
DRAFT
vector. Thus, one can talk of geometrical problems in ℝn and proved some geometrical
results.
We then independently defined the notion of a norm in ℝn and showed that a norm is induced by
an inner product if and only if the norm satisfies the parallelogram law (sum of squares of the diagonal
equals twice the sum of square of the two non-parallel sides).
The next subsection dealt with the fundamental theorem of linear algebra where we showed that if
A ∈ Mm,n(ℂ) then
-
1.
- dim(Null(A)) + dim(Col(A)) = n.
-
2.
- Null(A) = Col(A*)⊥ and Null(A*) = Col(A)⊥.
-
3.
- dim(Col(A)) = dim(Col(A*)).
We then saw that having an orthonormal basis is an asset as determining the
-
1.
- coordinates of a vector boils down to computing the inner product.
-
2.
- projection of a vector on a subspace boils down to finding an orthonormal basis of the
subspace and then summing the corresponding rank 1 matrices.
So, the question arises, how do we compute an orthonormal basis? This is where we came across
the Gram-Schmidt Orthonormalization process. This algorithm helps us to determine an orthonormal
basis of LS(S) for any finite subset S of a vector space. This also lead to the QR-decomposition of a
matrix.
Thus, we observe the following about the linear system Ax = b. If
-
1.
- b ∈ Col(A) then we can use the Gauss-Jordan method to get a solution.
-
2.
- bCol(A) then in most cases we need a vector x such that the least square error between
b and Ax is minimum. We saw that this minimum is attained by the projection of b on
Col(A). Also, this vector can be obtained either using the fundamental theorem of linear
algebra or by computing the matrix B(BT B)-1BT , where the columns of B are either the
pivot columns of A or a basis of Col(A).
DRAFT
DRAFT