Linear Regression ( Trendline ) and Correlation

Biology • Bio Lab Math and Data Analysis

Written by STEM Calculators Team Published January 2, 2026 Updated February 24, 2026

Paired data (x, y)

Blanks are ignored. Rows without at least two numeric values are skipped and reported. You can load a CSV/text file and paste/replace the textarea contents.

Options

Delimiter hint

Auto detects common separators per line. Use a hint if your file is unusual.

Decimals Trim trailing zeros

Predict at \(x_0\)

You can also drag the \(x_0\) marker on the scatter plot (after calculating) to update the prediction.

Show equation on graph Show residual segments Show residual plot

Residuals help spot patterns (nonlinearity, changing variance). A random cloud around 0 is a good sign.

Ready

Rate this calculator

0.0 /5 (0 ratings)

Be the first to rate.

Your rating

Name (optional) Review (optional)

You can update your rating any time.

Linear regression (trendline) + correlation — Theory

Chapter: Bio Lab Math & Data Analysis • Topic: Linear regression (trendline) + correlation

On this page:

1) What regression answers • 2) The model and the line • 3) Least squares fitting • 4) Correlation \(r\) and \(r^2\) • 5) Residuals, SSE, RMSE • 6) Interpreting the graph • 7) Common pitfalls

1) What linear regression answers

In many biology labs, you collect paired measurements \((x_i, y_i)\): for example, dose vs response, time vs concentration, temperature vs enzyme activity, or body length vs mass. Simple linear regression summarizes the relationship by fitting a straight line that best predicts \(y\) from \(x\).

What you get

A trendline \(\hat{y} = a + b\cdot x\).
Strength and direction of association via correlation \(r\).
How much variation is explained via \(r^2\).
Predictions: \(\hat{y}(x_0)\) for a chosen \(x_0\).

When it makes sense

The scatter plot suggests a roughly straight-line pattern.
Residuals do not show obvious structure (see below).
You are not extrapolating far beyond the observed \(x\)-range.

2) The model and the fitted line

The regression line used in this calculator is written as:

\[ \hat{y} = a + b\cdot x \]

Slope \(b\): expected change in \(\hat{y}\) for a 1-unit increase in \(x\).
Intercept \(a\): predicted value when \(x = 0\) (sometimes meaningful, sometimes not).

The “hat” in \(\hat{y}\) means “predicted y”. The actual observed value is \(y\), and the difference \(y-\hat{y}\) is the residual.

3) Least squares fitting (how \(a\) and \(b\) are chosen)

The fitted line is the one that minimizes the sum of squared vertical errors:

\[ \mathrm{SSE} = \sum_{i=1}^{n}\left(y_i-\hat{y}_i\right)^2 \qquad \text{where } \hat{y}_i = a + b\cdot x_i \]

To compute \(a\) and \(b\), we first define sample means and centered sums:

\[ \bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i, \qquad \bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_i \]

\[ S_{xx}=\sum_{i=1}^{n}(x_i-\bar{x})^2, \qquad S_{xy}=\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y}) \]

Then the least-squares slope and intercept are:

\[ b=\frac{S_{xy}}{S_{xx}}, \qquad a=\bar{y}-b\cdot \bar{x} \]

If all \(x\) values are identical, then \(S_{xx}=0\) and the slope cannot be computed. In practice, you need variation in \(x\).

4) Correlation \(r\) and determination \(r^2\)

Correlation measures the strength and direction of the linear relationship between \(x\) and \(y\). It is computed using:

\[ r=\frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} \qquad \text{where } \quad S_{yy}=\sum_{i=1}^{n}(y_i-\bar{y})^2 \]

\(r\in[-1,1]\).
\(r>0\): increasing trend. \(r<0\): decreasing trend.
\(|r|\) close to 1 indicates a strong linear pattern; close to 0 indicates weak linear association.

The coefficient of determination is:

\[ r^2 = (r)^2 \]

In simple linear regression, \(r^2\) can be interpreted as the fraction of variability in \(y\) explained by a linear model in \(x\). It does not guarantee causation.

5) Residuals, SSE, and RMSE

The residual for observation \(i\) is:

\[ e_i = y_i - \hat{y}_i \]

The calculator reports the sum of squared errors and a basic scale of residual size:

\[ \mathrm{SSE}=\sum_{i=1}^{n} e_i^2 \]

\[ \mathrm{RMSE}=\sqrt{\frac{\mathrm{SSE}}{n-2}} \]

The \(n-2\) appears because two parameters \((a,b)\) are estimated from the data. RMSE is best viewed as a “typical” vertical deviation from the line, in the units of \(y\).

6) Interpreting the graphs in the calculator

Scatter plot + trendline

Points show observed \((x,y)\).
The line shows \(\hat{y}=a+b\cdot x\).
The \(x_0\) marker shows the prediction \(\hat{y}(x_0)\).
Optional residual segments show each \(y_i-\hat{y}_i\) vertically.

Residual plot

Plots \((x_i, e_i)\) where \(e_i=y_i-\hat{y}_i\).
Good fit: residuals scattered around 0 with no systematic curve.
Curvature suggests a nonlinear relationship.
Funnel shape suggests changing variance.

The calculator’s interactivity (hover, zoom, pan, click-to-highlight) is meant to help you connect each plotted point with its numerical row in the table.

7) Common pitfalls (important in lab reporting)

Correlation is not causation. A strong \(r\) does not prove that changes in \(x\) cause changes in \(y\).
Outliers can dominate the slope. Always inspect the scatter plot and residuals.
Extrapolation risk. Predicting far beyond the observed \(x\)-range can be misleading.
Hidden groups. If data come from different conditions (e.g., different subjects or batches), a single line may be inappropriate.
Nonlinear relationships. If the pattern is curved, consider transformations or nonlinear models.

For future upgrades (beyond this basic calculator), confidence or prediction intervals can be added using the standard error of the regression and the \(t\) distribution.

Frequently Asked Questions

How does the calculator find the linear regression trendline?

It uses least squares to minimize the sum of squared residuals. The slope is b = Sxy / Sxx and the intercept is a = ybar - b xbar, where Sxy = sum((x - xbar)(y - ybar)) and Sxx = sum((x - xbar)^2).

What is the difference between correlation r and r squared (r^2)?

r measures the strength and direction of a linear association and ranges from -1 to 1. r^2 is the fraction of variability in y explained by a linear model in x for simple linear regression.

How do I predict y at a specific x value (x0)?

Enter x0 in the Predict at x0 field and calculate to get y-hat(x0) = a + b x0. After calculation, you can also drag the x0 marker on the scatter plot to update the prediction.

Why is a residual plot useful for checking a trendline?

Residuals are e = y - y-hat. A good linear fit usually shows residuals scattered around 0 without clear structure, while curvature or a funnel shape can indicate nonlinearity or changing variance.

What happens if all x values are the same?

Then Sxx = 0, so the slope cannot be computed because there is no variation in x. You need at least two distinct x values to fit a regression line.

Linear Regression ( Trendline ) and Correlation