Least Squares Regression Solver

Math Linear Algebra • Applications and Advanced Linear Algebra (capstone)

Written by STEM Calculators Team

Solve the least squares problem min ||Ax − b||² using the normal equations \(A^{\mathsf T}A\,x=A^{\mathsf T}b\). For data points \((x_i,y_i)\), the calculator fits a best-fit line \(y\approx c_0+c_1x\), reports residual error, and plots the fit.

Input type

Display decimals

Solver tolerance

Data points

Fit \(y\approx c_0+c_1x\) from \((x_i,y_i)\).

Number of points

Tip: points are converted to \(A=[\mathbf{1}\ \mathbf{x}]\) and \(b=\mathbf{y}\).

Ready

Results

Solution \(x\)

—

Residual norm \(\|Ax-b\|\)

—

SSE \(\sum r_i^2\) and RMSE

—

Fit equation (line mode)

—

Graph

Zoom with mouse wheel/trackpad, drag to pan, double-click to reset view. Click Play for animation.

—

Step-by-step

Enter inputs and click “Calculate”.

Rate this calculator

0.0 /5 (0 ratings)

Be the first to rate.

Your rating

Name (optional) Review (optional)

You can update your rating any time.

Theory

Least squares and the normal equations

In an overdetermined linear system, there are typically more equations than unknowns, so an exact solution to \(Ax=b\) may not exist. Least squares replaces “solve exactly” with “fit as closely as possible” by minimizing the squared error:

\[ \min_x \ \lVert Ax-b\rVert^2. \]

The vector \(r=Ax-b\) is called the residual. Least squares chooses \(x\) so that the residual norm \(\lVert r\rVert\) is as small as possible. A standard derivation sets the gradient of \(\lVert Ax-b\rVert^2\) to zero, which leads to the normal equations:

\[ \begin{aligned} \lVert Ax-b\rVert^2 &= (Ax-b)^{\mathsf T}(Ax-b) \\ \frac{\partial}{\partial x}\lVert Ax-b\rVert^2 &= 2A^{\mathsf T}(Ax-b)=0 \\ A^{\mathsf T}A\,x &= A^{\mathsf T}b. \end{aligned} \]

Interpretation: the residual at the solution is orthogonal to the columns of \(A\). This is a projection onto the column space of \(A\).

Line of best fit from points

For data points \((x_i,y_i)\) and a best-fit line \(y\approx c_0+c_1x\), we write:

\[ \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_m \end{bmatrix} \begin{bmatrix} c_0 \\ c_1 \end{bmatrix} \approx \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix}. \]

This matches the general least squares form \(Ax\approx b\). After solving, the predicted values are \(\hat{y}_i=c_0+c_1x_i\), and residuals are \(r_i=\hat{y}_i-y_i\). Common summary errors include SSE \(=\sum r_i^2\) and RMSE \(=\sqrt{\text{SSE}/m}\).

Numerical note

Solving via \(A^{\mathsf T}A\) is simple and matches many textbook presentations, but it can be sensitive when columns of \(A\) are nearly dependent (ill-conditioned). In more advanced numerical linear algebra, QR factorization or SVD is preferred. The calculator reports a clear error if \(A^{\mathsf T}A\) is singular (not invertible).

University extension: weighted least squares replaces \(\lVert Ax-b\rVert^2\) with \(\lVert W(Ax-b)\rVert^2\) to emphasize certain equations or points.

Frequently Asked Questions

What does least squares regression compute?

It finds the vector x that minimizes the squared error ||Ax - b||^2, producing the best approximate solution when Ax = b cannot be satisfied exactly.

How are the normal equations used?

The calculator forms A^T A x = A^T b and solves this system for x. This x minimizes ||Ax - b||^2 when A^T A is invertible.

What do SSE and RMSE mean in the output?

SSE is the sum of squared residuals sum r_i^2. RMSE is sqrt(SSE/m) and gives an average-sized error per equation or data point.

Why do I see a residual bar chart instead of a fit line?

A line plot is only shown when the problem matches a 1D predictor line model (points mode or A with columns [1, x]). Otherwise the calculator plots residual magnitudes.