What if the linear matrix equation \(\mathrm{Ax}=\mathrm{b}\) doesn’t have a solution?
Well, we would need to find the best approximate solution. And to do this, we will use the method of least squares.
Don’t worry. You have seen this idea before.
Remember back in Algebra how we created lines of “best fit”? Well, best-fit lines, sometimes called linear regression lines or trend lines, are just like least squares solutions, as they are a best approximate solution for an inconsistent matrix equation \(\mathrm{Ax}=\mathrm{b}\).
Least Squares Definition and Normal Equations
So, our goal is to estimate a solution for a system that cannot be solved algebraically.
How?
Well, knowing that we can’t solve the equation \(A \vec{x}=\vec{b}\) using our usual methods, we’re going to find an \(x\) that makes \(A x\) as close as possible to \(b\).
So, the smaller the distance between \(\mathrm{b}\) and \(\mathrm{Ax}\), given by \(\|\vec{b}-A \vec{x}\|\), the better the approximation to the system.
Recall that the distance between two vectors, denoted \(\|\vec{a}-\vec{b}\|\) is the square root of the sum of the squares, and that’s how we get the name “least-squares” as we are trying to find the smallest sum of the squares.
- Okay, so our definition states that if we let \(\mathrm{A}\) be an \(m \times n\) matrix and \(\vec{b}\) is in \(\mathbb{R}^{m}\), a least-squares solution of \(A \vec{x}=\vec{b}\) is an \(\hat{x}\) in \(\mathbb{R}^{n}\) such that \(\|\vec{b}-A \hat{b}\| \leq\|\vec{b}-A \vec{x}\|\) for all \(\vec{x}\) in \(\mathbb{R}^{n}\).
- Additionally, the set of least-squares solutions of \(A \vec{x}=\vec{b}\) coincides with the nonempty set of solutions of the normal equations \(A^{T} A \vec{x}=A^{T} \vec{b}\).
Methods for Finding Least Squares Solutions
In other words, as long as matrix \(A^{T} A\) is invertible and the columns of \(\mathrm{A}\) are linearly independent we can find our best approximation, \(\hat{x}\), by using one of the following three methods:
\begin{equation}
\hat{x}=\left(A^T A\right)^{-1}\left(A^T \vec{b}\right)
\end{equation}
using row reduction to find a consistent matrix
\begin{equation}
\hat{x}=\left[\begin{array}{ll}
A^T A & A^T \vec{b}
\end{array}\right]
\end{equation}
from QR factorization
\begin{equation}
\hat{x}=R^{-1} Q^T \vec{b}
\end{equation}
Example: Row Reduction Method
Alright, let’s work through an example together.
Find the least-squares solution of the inconsistent system \(A \vec{x}=\vec{b}\) for:
\begin{equation}
A=\left[\begin{array}{cc}
-1 & 2 \\
2 & -3 \\
-1 & 3
\end{array}\right]
\end{equation}
\begin{equation}
\vec{b}=\left[\begin{array}{l}
4 \\
1 \\
2
\end{array}\right]
\end{equation}
First, we will calculate the matrix multiplication for \(A^{T} A\) and \(A^{T} \vec{b}\)
\begin{equation}
A^T A=\left[\begin{array}{ccc}
-1 & 2 & -1 \\
2 & -3 & 3
\end{array}\right]\left[\begin{array}{cc}
-1 & 2 \\
2 & -3 \\
-1 & 3
\end{array}\right]
\end{equation}
\begin{equation}
=\left[\begin{array}{cc}
6 & -11 \\
-11 & 22
\end{array}\right]
\end{equation}
\begin{equation}
A^T \vec{b}=\left[\begin{array}{ccc}
-1 & 2 & -1 \\
2 & -3 & 3
\end{array}\right]\left[\begin{array}{l}
4 \\
1 \\
2
\end{array}\right]
\end{equation}
\begin{equation}
=\left[\begin{array}{c}
-4 \\
11
\end{array}\right]
\end{equation}
Next, we will create an augmented matrix and use row echelon techniques to find a consistent system.
\begin{equation}
\left[\begin{array}{ll}
A^T A & A^T \vec{b}
\end{array}\right]=\left[\begin{array}{ccc}
6 & -11 & -4 \\
-11 & 22 & 11
\end{array}\right]
\end{equation}
\begin{equation}
\sim\left[\begin{array}{lll}
1 & 0 & 3 \\
0 & 1 & 2
\end{array}\right]
\end{equation}
Consequently, the general least-squares solution and the best approximation to the system is.
\begin{equation}
\hat{x}=\left[\begin{array}{l}
3 \\
2
\end{array}\right]
\end{equation}
I have found that using row reduction is the quickest way to find the least squares solution, but as we will see in our video, using inverses and QR factorization are also excellent choices when the problem allows for it.
Orthogonal Projections Method
In fact, there is a fourth method for finding the least squares solution when we are given an orthogonal basis. This process involves orthogonal projection of \(b\) onto the column space of A.
Example: Orthogonal Projections Method
For example, let’s find the least-squares solution of \(A \vec{x}=\vec{b}\) for:
\begin{equation}
A=\left[\begin{array}{cc}
1 & 5 \\
3 & 1 \\
-2 & 4
\end{array}\right]
\end{equation}
\begin{equation}
\vec{b}=\left[\begin{array}{c}
4 \\
-2 \\
-3
\end{array}\right]
\end{equation}
First, we notice that the columns of \(\mathrm{A}\) form an orthogonal set.
\begin{equation}
A=\left[\begin{array}{cc}
1 & 5 \\
3 & 1 \\
-2 & 4
\end{array}\right] \Rightarrow \underbrace{\left[\begin{array}{c}
1 \\
3 \\
-2
\end{array}\right]}_{a_1}, \underbrace{\left[\begin{array}{c}
5 \\
1 \\
4
\end{array}\right]}_{a_2}
\end{equation}
\begin{equation}
\Rightarrow \overrightarrow{a_1} \cdot \vec{a}_2=0
\end{equation}
Therefore, we can find the orthogonal projection of \(b\) onto \(\operatorname{Col}(\mathrm{A})\) to assist us in finding our least squares solution.
If we recall from our previous lessons, the formula we need is
\begin{equation}
\vec{b}=\frac{\vec{b} \cdot \overrightarrow{a_1}}{\overrightarrow{a_1} \cdot \overrightarrow{a_1}} \vec{a}_1+\frac{\vec{b} \cdot \overrightarrow{a_2}}{\overrightarrow{a_2} \cdot \overrightarrow{a_2}} \vec{a}_2
\end{equation}
So, let’s find the orthogonal projection by computing our dot products.
\begin{equation}
\vec{b} \cdot \overrightarrow{a_1}=(\vec{b})^T \cdot \overrightarrow{a_1}
\end{equation}
\begin{equation}
=\left[\begin{array}{lll}
4 & -2 & -3
\end{array}\right]\left[\begin{array}{c}
1 \\
3 \\
-2
\end{array}\right]
\end{equation}
\begin{equation}
=4
\end{equation}
\begin{equation}
\vec{b} \cdot \overrightarrow{a_2}=(\vec{b})^T \cdot \overrightarrow{a_1}
\end{equation}
\begin{equation}
=\left[\begin{array}{lll}
4 & -2 & -3
\end{array}\right]\left[\begin{array}{l}
5 \\
1 \\
4
\end{array}\right]
\end{equation}
\begin{equation}
=6
\end{equation}
\begin{equation}
\overrightarrow{a_1} \cdot \overrightarrow{a_1}=\left(\overrightarrow{a_1}\right)^T \cdot \overrightarrow{a_1}
\end{equation}
\begin{equation}
=\left[\begin{array}{lll}
1 & 3 & -2
\end{array}\right]\left[\begin{array}{c}
1 \\
3 \\
-2
\end{array}\right]
\end{equation}
\begin{equation}
=14
\end{equation}
\begin{equation}
\overrightarrow{a_2} \cdot \overrightarrow{a_2}=\left(\overrightarrow{a_2}\right)^T \cdot \overrightarrow{a_2}
\end{equation}
\begin{equation}
=\left[\begin{array}{lll}
5 & 1 & 4
\end{array}\right]\left[\begin{array}{l}
5 \\
1 \\
4
\end{array}\right]
\end{equation}
\begin{equation}
=42
\end{equation}
Now let’s substitute our values into our projection formula.
\begin{equation}
\vec{b}=\frac{4}{14} \vec{a}_1+\frac{6}{42} \vec{a}_2
\end{equation}
\begin{equation}
\Rightarrow \vec{b}=\frac{2}{7}\left[\begin{array}{c}
1 \\
3 \\
-2
\end{array}\right]+\frac{1}{7}\left[\begin{array}{l}
5 \\
1 \\
4
\end{array}\right]
\end{equation}
\begin{equation}
\Rightarrow \vec{b}=\left[\begin{array}{l}
1 \\
1 \\
0
\end{array}\right]
\end{equation}
And finally, now that \(\mathrm{b}\) is known, we can approximate the solution to the \(\mathrm{Ax}=\mathrm{b}\) by using the weights from our projection equation above.
Because if we let \(\mathrm{A}\) be an \(m \times n\) matrix with orthogonal columns and \(\vec{b}\) be a vector in \(\mathbb{R}^{n}\) then
\begin{equation}
\hat{x}=\left[\left(\frac{\vec{b} \cdot \overrightarrow{a_1}}{\overrightarrow{a_1} \cdot \overrightarrow{a_1}}\right),\left(\frac{\vec{b} \cdot \overrightarrow{a_2}}{\overrightarrow{a_2} \cdot \overrightarrow{a_2}}\right), \cdots,\left(\frac{\vec{b} \cdot \overrightarrow{a_n}}{\overrightarrow{a_n} \cdot \overrightarrow{a_n}}\right)\right]
\end{equation}
So, the general solution to our least-squares problem is
\begin{equation}
\hat{x}=\left[\begin{array}{c}
\left(\frac{\vec{b} \cdot \overrightarrow{a_1}}{\overrightarrow{a_1} \cdot \overrightarrow{a_1}}\right) \\
\left(\frac{\vec{b} \cdot \overrightarrow{a_2}}{\overrightarrow{a_2} \cdot \overrightarrow{a_2}}\right)
\end{array}\right] \Rightarrow \hat{x}=\left[\begin{array}{c}
2 / 7 \\
1 / 7
\end{array}\right]
\end{equation}
Not bad, right?
Next Steps
Together, we will work through numerous examples to find the least-square solution. We’ll use various methods, such as:
- Row reduction
- Inverses
- QR factorization
- Orthogonal projections
Additionally, we’ll find the error for our approximation.
It’s going to be fun, so let’s get to it!
Video Tutorial w/ Full Lesson & Detailed Examples
Get access to all the courses and over 450 HD videos with your subscription
Monthly and Yearly Plans Available
Still wondering if CalcWorkshop is right for you?
Take a Tour and find out how a membership can take the struggle out of learning math.