Did you know that our ability to understand the relationships among quantities that vary can be modeled using something called Least-Squares Lines?
In fact, a least-squares line can be used to predict future values, which can be pretty useful in science, business, and engineering.
Finding the Linear Regression Line
Okay, so what is the least-squares regression line (LSRL)?
It is a line that “best fits” the data, which is why least-squares regression is sometimes called the line of best fit.
Suffice it to say the simplest relationship between two variables is a line, and a least-square regression line matches the pattern or relationship of a set of paired data as closely as possible.
Therefore, if a straight-line relationship is observed, we can describe this association with a regression line. This trend line minimizes the prediction of error, called residuals, as discussed by Shafer and Zhang.
Regression Equation
And the regression equation provides a rule for predicting or estimating the response variable’s values when the two variables are linearly related.
Residuals
Wait! What are residuals?
Residuals are the differences between the observed and predicted values. It measures the distance from the regression line (predicted value) and the actual observed value. In other words, it helps us to measure error, or how well our least-squares line “fits” our data.
Okay, so knowing that we are looking for an equation of the form.
\(y=\beta_{0}+\beta_{1} x\)
How do we find the y-intercept \(\beta_{0}\) and the slope \(\beta_{1}\)?
Computing the Least-Squares Solution
By computing the least-squares solution of \(X \beta=\vec{y}\).
Remember how in our previous lesson we learned that the least squares solution of the inconsistent matrix equation \(\overrightarrow{A x}=\vec{b}\) is found by row reducing the augmented matrix \(\left[\begin{array}{ll}A^{T} A & A^{T} \vec{b}\end{array}\right] ?\)
Well, the same thing is happening here.
\begin{equation}
\overrightarrow{A x}=\vec{b} \leftrightarrow X \beta=\vec{y}
\end{equation}
Constructing the Design Matrix and Observation Vector
Therefore, \(\beta=\left[\begin{array}{ll}X^{T} X & X^{T} \vec{y}\end{array}\right]\) where we use the \(\mathrm{x}\)-coordinates of the data to build the design matrix \(\mathrm{X}\) and the \(\mathrm{y}\)-coordinates of the data to construct the observation vector.
\begin{align*}
X=\left[\begin{array}{cc}
1 & x_{1} \\
1 & x_{2} \\
\vdots & \vdots \\
1 & x_{n}
\end{array}\right] \quad \vec{y}=\left[\begin{array}{c}
y_{1} \\
y_{2} \\
\vdots \\
y_{n}
\end{array}\right] \underbrace{\beta=\left[\begin{array}{c}
\beta_{0} \\
\beta_{1}
\end{array}\right]}_{\text {Parametric Vector }}
\end{align*}
Working Through an Example
Let’s look at an example to get a feel for how this works.
Find the equation \(y=\beta_{0}+\beta_{1} x\) of the least-squares that best fits the given data points:
\begin{align*}
(0,1),(1,1),(2,2),(3,2)
\end{align*}
Step 1: Creating the Design Matrix
First, we will create our Design Matrix. Noticing that there are four data points, we will have a \(4 \times 2\) matrix, where the entire first column are all \(1 \mathrm{~s}\) and the second column is made up of the \(\mathrm{x}-\) coordinates from the data points.
\begin{align*}
(0,1),(1,1),(2,2),(3,2) \rightarrow X=\left[\begin{array}{ll}
1 & 0 \\
1 & 1 \\
1 & 2 \\
1 & 3
\end{array}\right]
\end{align*}
Step 2: Forming the Observation Vector
Next, we will create our Observation vector, a column matrix consisting of the \(y-\) coordinates from our data points.
\begin{align*}
(0,1),(1,1),(2,2),(3,2) \rightarrow \vec{y}=\left[\begin{array}{l}
1 \\
1 \\
2 \\
2
\end{array}\right]
\end{align*}
Step 3: Performing Matrix Multiplication
Now we will compute our matrix multiplication to find \(X^{T} X\) and \(X^{T} \vec{y}\).
\begin{align*}
X^{T} X=\left[\begin{array}{llll}
1 & 1 & 1 & 1 \\
0 & 1 & 2 & 3
\end{array}\right]\left[\begin{array}{ll}
1 & 0 \\
1 & 1 \\
1 & 2 \\
1 & 3
\end{array}\right]=\left[\begin{array}{cc}
4 & 6 \\
6 & 14
\end{array}\right]
\end{align*}
\begin{align*}
X^{T} \vec{y}=\left[\begin{array}{llll}
1 & 1 & 1 & 1 \\
0 & 1 & 2 & 3
\end{array}\right]\left[\begin{array}{l}
1 \\
1 \\
2 \\
2
\end{array}\right]=\left[\begin{array}{c}
6 \\
11
\end{array}\right]
\end{align*}
Step 4: Determining the Least-Squares Line
And lastly, we will find our least squares solution by row reducing our augmented matrix.
\begin{align*}
\begin{aligned}
& \beta=\left[\begin{array}{ll}
X^{T} X & X^{T} \vec{y}
\end{array}\right] \rightarrow\left[\begin{array}{ccc}
4 & 6 & 6 \\
6 & 14 & 11
\end{array}\right] \sim\left[\begin{array}{ccc}
1 & 0 & 9 / 10 \\
0 & 1 & 2 / 5
\end{array}\right] \\
& \beta=\left[\begin{array}{l}
\beta_{0} \\
\beta_{1}
\end{array}\right]=\left[\begin{array}{c}
9 / 10 \\
2 / 5
\end{array}\right]
\end{aligned}
\end{align*}
Finally, we can write our least-square line as follows:
\begin{align*}
y=\beta_{0}+\beta_{1} x \Rightarrow y=\frac{9}{10}+\frac{2}{5} x
\end{align*}
Awesome!
Expanding to Regression Curves
And here’s the exciting part – we aren’t limited to just regression lines, we can also create regression curves!
Quadratic Regression
Yep, we can use the process of least-square to fit parabolic curves, cubic curves, and more.
Quadratics: \(y=\beta_{0}+\beta_{1} x+\beta_{2} x^{2}\)
\begin{equation}
X=\left[\begin{array}{ccc}
1 & x_1 & x_1^2 \\
1 & x_2 & x_2^2 \\
\vdots & \vdots & \vdots \\
1 & x_n & x_n^2
\end{array}\right] \quad \vec{y}=\left[\begin{array}{c}
y_1 \\
y_2 \\
\vdots \\
y_n
\end{array}\right] \quad \beta=\left[\begin{array}{c}
\beta_0 \\
\beta_1 \\
\beta_2
\end{array}\right]
\end{equation}
Cubic Regression
Cubics: \(y=\beta_{0}+\beta_{1} x+\beta_{2} x^{2}+\beta_{3} x^{3}\)
\begin{equation}
X=\left[\begin{array}{cccc}
1 & x_1 & x_1^2 & x_1^3 \\
1 & x_2 & x_2^2 & x_2^3 \\
\vdots & \vdots & \vdots & \vdots \\
1 & x_n & x_n^2 & x_n^3
\end{array}\right] \quad \vec{y}=\left[\begin{array}{c}
y_1 \\
y_2 \\
\vdots \\
y_n
\end{array}\right] \quad \beta=\left[\begin{array}{c}
\beta_0 \\
\beta_1 \\
\beta_2 \\
\beta_3
\end{array}\right]
\end{equation}
Next Steps
Don’t worry. We’ll work through numerous examples together and learn how to find least-squares lines and curves given data values and use our regression to predict future values.
Let’s jump on in!
Video Tutorial w/ Full Lesson & Detailed Examples
Get access to all the courses and over 450 HD videos with your subscription
Monthly and Yearly Plans Available
Still wondering if CalcWorkshop is right for you?
Take a Tour and find out how a membership can take the struggle out of learning math.