A sample correlation coefficient calculator computes the Pearson sample correlation coefficient \(r\) from paired observations \((x_i,y_i)\). The value \(r\) is unitless and lies between \(-1\) and \(1\), describing the direction (positive or negative) and strength of a linear relationship.
For \(n\) paired observations \((x_1,y_1),\dots,(x_n,y_n)\) with sample means \(\bar{x}\) and \(\bar{y}\), \[ r=\frac{\sum_{i=1}^{n}(x_i-\bar{x})\cdot(y_i-\bar{y})}{\sqrt{\left(\sum_{i=1}^{n}(x_i-\bar{x})^2\right)\cdot\left(\sum_{i=1}^{n}(y_i-\bar{y})^2\right)}}. \]
What the calculator returns and how to interpret it
- Range: \(-1 \le r \le 1\).
- Sign: \(r>0\) indicates a positive linear association; \(r<0\) indicates a negative linear association.
- Strength: \(|r|\) close to 1 indicates a strong linear pattern; \(|r|\) near 0 indicates weak linear association.
- Important limitation: \(r\) measures linear association; it does not prove causation and can be distorted by outliers.
Efficient computational form (often used by calculators)
Many implementations compute \(r\) from running sums to avoid repeatedly forming deviations:
Let \(S_x=\sum x_i\), \(S_y=\sum y_i\), \(S_{xx}=\sum x_i^2\), \(S_{yy}=\sum y_i^2\), \(S_{xy}=\sum x_i y_i\). Then \[ r=\frac{n\cdot S_{xy}-S_x\cdot S_y}{\sqrt{\left(n\cdot S_{xx}-S_x^2\right)\cdot\left(n\cdot S_{yy}-S_y^2\right)}}. \]
Worked example (step-by-step)
Consider the paired dataset:
| i | \(x_i\) | \(y_i\) |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 3 | 5 |
| 4 | 4 | 4 |
| 5 | 5 | 6 |
- Compute sample means: \[ \bar{x}=\frac{1+2+3+4+5}{5}=3,\quad \bar{y}=\frac{2+3+5+4+6}{5}=4. \]
- Compute deviations, cross-products, and sums:
| i | \(x_i\) | \(y_i\) | \(x_i-\bar{x}\) | \(y_i-\bar{y}\) | \((x_i-\bar{x})\cdot(y_i-\bar{y})\) | \((x_i-\bar{x})^2\) | \((y_i-\bar{y})^2\) |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 2 | \(-2\) | \(-2\) | 4 | 4 | 4 |
| 2 | 2 | 3 | \(-1\) | \(-1\) | 1 | 1 | 1 |
| 3 | 3 | 5 | 0 | 1 | 0 | 0 | 1 |
| 4 | 4 | 4 | 1 | 0 | 0 | 1 | 0 |
| 5 | 5 | 6 | 2 | 2 | 4 | 4 | 4 |
| Sums | 9 | 10 | 10 | ||||
- Substitute into the definition: \[ r=\frac{9}{\sqrt{10\cdot 10}}=\frac{9}{10}=0.9. \]
- Interpret the result: \(r=0.9\) indicates a strong positive linear association for this sample.
Visualization: scatter plot and direction of correlation
Connection to covariance and regression
The sample correlation coefficient standardizes the sample covariance by the sample standard deviations: \[ s_{xy}=\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})\cdot(y_i-\bar{y}),\quad s_x=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2},\quad s_y=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(y_i-\bar{y})^2}. \] Then \[ r=\frac{s_{xy}}{s_x\cdot s_y}. \] In simple linear regression of \(y\) on \(x\), the slope satisfies \[ b_1=r\cdot\frac{s_y}{s_x}, \] making correlation directly linked to the fitted line’s direction and steepness.
- Plot the data first; correlation summarizes the scatter plot but does not replace it.
- Check for outliers; a single extreme point can change \(r\) substantially.
- Use \(r\) for linear association; curved relationships can have \(r\) near 0 even when strongly related.