Loading…

Linear Correlation

Statistics • Simple Linear Regression

View all topics

Linear Correlation (Pearson r)

Compute the sample linear correlation coefficient, visualize the relationship, and (optionally) test whether the population correlation is zero using the t distribution.

Paste two numeric columns (x, y) — one pair per line — separated by comma, tab, semicolon, or spaces. Example: 10, 22. You can also upload a CSV file.

Inputs

Tip: the correlation coefficient always lies between −1 and 1. Values near 1 indicate strong positive linear association, near −1 strong negative linear association, and near 0 little to no linear association.

What different values of r look like

The panels below are illustrative patterns (perfect/none, and strong/weak). Your own data plot appears after you calculate.

Correlation patterns gallery Ready

Note: correlation describes strength of a linear association. A large |r| does not, by itself, prove causation.

Rate this calculator

0.0 /5 (0 ratings)
Be the first to rate.
Your rating
You can update your rating any time.

Frequently Asked Questions

What does the Pearson correlation coefficient r measure?

Pearson r measures the strength and direction of a linear relationship between two variables using paired (x, y) data. It ranges from -1 to 1, where values near 1 indicate strong positive linear association and values near -1 indicate strong negative linear association.

How is r calculated from x and y data?

The calculator uses r = SSxy / sqrt(SSxx x SSyy), where SSxx = sum((xi - xbar)^2), SSyy = sum((yi - ybar)^2), and SSxy = sum((xi - xbar)(yi - ybar)). These are corrected sums based on the sample means.

How do you test whether the population correlation rho is zero?

For n paired observations, the test uses t = r x sqrt((n - 2) / (1 - r^2)) with degrees of freedom df = n - 2. The p-value depends on whether the alternative is two-sided, right-tailed, or left-tailed.

What does r^2 mean in a correlation report?

r^2 is the coefficient of determination, interpreted as the proportion of variation in y that is explained by a straight-line relationship with x. It summarizes explained variability for a linear model but does not prove causation.

Why can r be undefined for some datasets?

If all x values are identical then SSxx = 0, and if all y values are identical then SSyy = 0, so the denominator in the r formula becomes zero. In those cases the correlation coefficient cannot be computed.