A linear regression calculator fits a straight-line relationship between two quantitative variables \(x\) (explanatory) and \(y\) (response). In simple linear regression, the fitted line is \[ \hat y = b_0 + b_1 \cdot x, \] where \(b_1\) is the slope and \(b_0\) is the intercept.
Main outputs: regression equation \(\hat y=b_0+b_1 \cdot x\), predicted values \(\hat y\), residuals \(e=y-\hat y\), and goodness-of-fit summaries such as \(SSE\) and \(R^2\).
1) How the least-squares line is computed
Given paired data \((x_1,y_1),\ldots,(x_n,y_n)\), define the sample means \(\bar x\) and \(\bar y\), and the sums
\[ S_{xx}=\sum_{i=1}^{n}(x_i-\bar x)^2,\qquad S_{xy}=\sum_{i=1}^{n}(x_i-\bar x)\cdot(y_i-\bar y). \]
The least-squares slope and intercept are \[ b_1=\frac{S_{xy}}{S_{xx}},\qquad b_0=\bar y-b_1\cdot\bar x. \]
2) Predictions and residuals
For each observed \(x_i\), \[ \hat y_i=b_0+b_1\cdot x_i,\qquad e_i=y_i-\hat y_i. \]
Residuals measure vertical deviations from the fitted line; small residuals indicate the line explains \(y\) well at that \(x\).
3) Measuring fit: \(SSE\), \(SST\), \(SSR\), and \(R^2\)
The total variation in \(y\) is \[ SST=\sum_{i=1}^{n}(y_i-\bar y)^2. \] The unexplained variation after fitting the line is \[ SSE=\sum_{i=1}^{n}(y_i-\hat y_i)^2. \] The explained variation is \(SSR=SST-SSE\), and the coefficient of determination is \[ R^2=\frac{SSR}{SST}=1-\frac{SSE}{SST}. \]
Interpretation of \(R^2\): the proportion of variability in \(y\) explained by the linear relationship with \(x\). For example, \(R^2=0.80\) indicates 80% of the variation in \(y\) is explained by the fitted line.
4) Worked example (numbers a linear regression calculator would compute)
Data (five observations):
| \(i\) | \(x_i\) | \(y_i\) |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 2 | 2 |
| 3 | 3 | 4 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
Compute means: \[ \bar x=\frac{1+2+3+4+5}{5}=3,\qquad \bar y=\frac{2+2+4+4+5}{5}=3.4. \]
Compute \(S_{xx}\) and \(S_{xy}\): \[ S_{xx}=(1-3)^2+(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2 =4+1+0+1+4=10, \] \[ S_{xy}=(1-3)\cdot(2-3.4)+(2-3)\cdot(2-3.4)+(3-3)\cdot(4-3.4)+(4-3)\cdot(4-3.4)+(5-3)\cdot(5-3.4) =2.8+1.4+0+0.6+3.2=8. \]
Slope and intercept: \[ b_1=\frac{8}{10}=0.8,\qquad b_0=3.4-0.8\cdot 3=1.0. \] Regression equation: \[ \hat y = 1.0 + 0.8\cdot x. \]
Predicted values and residuals:
| \(x_i\) | \(y_i\) | \(\hat y_i=1.0+0.8\cdot x_i\) | \(e_i=y_i-\hat y_i\) | \(e_i^2\) |
|---|---|---|---|---|
| 1 | 2 | 1.8 | 0.2 | 0.04 |
| 2 | 2 | 2.6 | -0.6 | 0.36 |
| 3 | 4 | 3.4 | 0.6 | 0.36 |
| 4 | 4 | 4.2 | -0.2 | 0.04 |
| 5 | 5 | 5.0 | 0.0 | 0.00 |
Compute sums of squares: \[ SSE=0.04+0.36+0.36+0.04+0.00=0.80. \] Also, \[ SST=\sum (y_i-\bar y)^2 =(2-3.4)^2+(2-3.4)^2+(4-3.4)^2+(4-3.4)^2+(5-3.4)^2 =1.96+1.96+0.36+0.36+2.56=7.20. \] Then \[ R^2=1-\frac{SSE}{SST}=1-\frac{0.80}{7.20}=1-\frac{1}{9}=\frac{8}{9}\approx 0.8889. \]
A sample prediction: at \(x=6\), \[ \hat y = 1.0 + 0.8\cdot 6 = 5.8. \]
5) Visualization: scatterplot with fitted line and residuals
6) Practical interpretation of calculator outputs
- Slope \(b_1\): expected change in \(\hat y\) per 1-unit increase in \(x\); here \(b_1=0.8\).
- Intercept \(b_0\): predicted value when \(x=0\); here \(b_0=1.0\) (interpretation depends on whether \(x=0\) is meaningful).
- \(R^2\): strength of linear fit; here \(R^2\approx 0.8889\) indicates a strong linear explanation of variation.
- Residual pattern: random scatter around 0 supports a linear model; systematic curvature suggests a nonlinear relationship.
7) Common limitations
- Linearity: a straight line can be misleading if the relationship is curved.
- Outliers and leverage: a single extreme point can strongly change \(b_1\) and \(b_0\).
- Extrapolation: predictions far outside the observed \(x\)-range are often unreliable.
- Association vs causation: a good fit does not imply that changes in \(x\) cause changes in \(y\).