... LIVE
Enter X and Y data pairs (minimum 3, maximum 30)
Load Sample Dataset
# X Variable Y Variable
Pearson Correlation Coefficient (r)
--
⚠️ Note: Pearson r measures linear correlation only. A result of r = 0 does not mean no relationship -- non-linear relationships will not be detected. Always visualize your data before relying solely on r. Correlation does not imply causation.

Sources & Methodology

Pearson correlation formula verified against standard statistical textbooks and NIST engineering statistics handbook. Results match calculation to 6+ decimal places. Tested with known datasets including perfect positive (r=1.000), perfect negative (r=-1.000), and moderate (r=0.529809) cases.
📋
NIST Engineering Statistics Handbook — Pearson Correlation
National Institute of Standards and Technology reference for the Pearson product-moment correlation formula, computation method, and interpretation guidelines. Authoritative source for the r formula and t-statistic significance test used in this calculator.
📖
Statology — Pearson Correlation Coefficient Explained
Statistical reference for correlation strength classification thresholds (negligible, weak, moderate, strong, very strong) and interpretation guidelines used in this calculator's classification system.
Verified Formula (Pearson r): r = Sum((xi - x_mean)(yi - y_mean)) / sqrt(Sum((xi - x_mean)^2) x Sum((yi - y_mean)^2)) R-squared = r^2 (proportion of variance in Y explained by X) t-statistic = r x sqrt(n-2) / sqrt(1 - r^2) | degrees of freedom = n - 2 Covariance (Sxy) = Sum((xi-xbar)(yi-ybar)) / (n-1) [for reference] Tests: Perfect positive [1,2,3,4,5] vs [2,4,6,8,10]: r=1.000000. Perfect negative: r=-1.000000. Textbook dataset [43,21,25,42,57,59] vs [99,65,79,75,87,81]: r=0.529809. All correct.

Correlation Coefficient (Pearson r) — Complete 2026 Guide

The Pearson correlation coefficient (r) is one of the most widely used statistics in data analysis, research, and machine learning. It measures both the strength and direction of the linear relationship between two variables on a standardised scale from -1 to +1. Understanding how to calculate r, how to interpret it, and its limitations is a fundamental skill for any student, researcher, or data analyst.

The Pearson r Formula — Step by Step

r = Sum((xi - x_mean)(yi - y_mean)) / sqrt(Sum((xi-x_mean)^2) x Sum((yi-y_mean)^2))
Step 1: Calculate the mean of X and mean of Y.
Step 2: For each pair (xi, yi), compute (xi - xbar) and (yi - ybar).
Step 3: Multiply each pair of deviations: (xi - xbar) x (yi - ybar). Sum these products.
Step 4: Sum (xi - xbar)^2. Sum (yi - ybar)^2. Multiply these sums and take the square root.
Step 5: Divide Step 3 result by Step 4 result. This is r.

Correlation Strength Classification

|r| ValueStrengthExample Real-World Correlation
0.00 – 0.19NegligibleShoe size vs. IQ scores
0.20 – 0.39WeakHeight vs. academic performance
0.40 – 0.59ModerateAdvertising spend vs. sales revenue
0.60 – 0.79StrongStudy hours vs. exam score
0.80 – 0.99Very StrongTemperature vs. ice cream sales
1.00PerfectOnly in theory or identical variables

R-Squared: The Coefficient of Determination

R-squared (R²) = r² expresses the proportion of variance in Y explained by the linear relationship with X, as a percentage. If r = 0.8, then R² = 0.64, meaning 64% of Y's variation is explainable by X. The remaining 36% is due to other factors or random variation. R² is non-negative (ranges 0 to 1), unlike r which can be negative. In simple linear regression, R² equals the square of the correlation coefficient and is the primary metric for model goodness-of-fit.

Statistical Significance: The t-Test for r

A correlation must be tested for statistical significance before being interpreted as meaningful. The t-statistic for r = r x sqrt(n-2) / sqrt(1-r²) with n-2 degrees of freedom. For small samples, high r values may not be statistically significant. For large samples, even small r values (0.2) can be statistically significant. Always report both r and its significance level (p-value) in academic and professional work.

💡 Correlation does not imply causation: Ice cream sales and drowning deaths are positively correlated (r ≈ 0.9) — both peak in summer due to heat. Exercise frequency and body weight are negatively correlated (r ≈ -0.5) — but reverse causation is also possible (heavier people may exercise less). Always ask: could a third variable explain this correlation? Could the causal direction be reversed? Is this a coincidental time-series correlation?
Frequently Asked Questions
Pearson r measures strength and direction of the linear relationship between two variables, on a scale from -1 to +1. r=+1: perfect positive linear relationship. r=-1: perfect negative. r=0: no linear relationship. Values between these extremes indicate degree of linear association. Only linear relationships are captured -- r=0 can still mean a perfect non-linear relationship exists.
r = Sum((xi-xbar)(yi-ybar)) / sqrt(Sum((xi-xbar)^2) x Sum((yi-ybar)^2)). Steps: (1) Calculate means of X and Y. (2) Compute deviations from mean for each pair. (3) Sum the products of paired deviations (numerator). (4) Sum squared deviations for X and Y separately, multiply together, take square root (denominator). (5) Divide numerator by denominator.
|r| 0.00-0.19: negligible. 0.20-0.39: weak. 0.40-0.59: moderate. 0.60-0.79: strong. 0.80-1.00: very strong. Positive r: variables move in same direction. Negative r: opposite directions. Context matters -- r=0.5 is strong in social science but weak in physics. Always report significance (t-test) alongside r, especially with small samples.
R-squared = r^2. Proportion of variance in Y explained by X. If r=0.8, R^2=0.64 = 64% of Y's variation is explained by the linear relationship with X. The remaining 36% is unexplained. R^2 ranges 0-1 (0-100%). Higher R^2 = better predictive model. Always non-negative unlike r.
t = r x sqrt(n-2) / sqrt(1-r^2), with df = n-2. Compare to t-distribution critical values at your significance level (p < 0.05 is standard). Larger |t| = stronger evidence of real correlation. With small n (below 10), even r=0.6 may not be significant. With large n (above 100), even r=0.2 can be statistically significant though practically weak.
As one variable increases, the other tends to decrease. Examples: exercise and body weight (r ≈ -0.5), temperature and heating costs (r ≈ -0.9), errors and study time (r ≈ -0.7). The strength interpretation is identical to positive correlation -- only direction differs. r=-0.8 = very strong negative linear relationship.
Correlation measures statistical association -- whether variables move together. Causation means one directly causes changes in the other. Correlation does not imply causation. Classic example: ice cream sales and drowning deaths correlate strongly (both peak in summer due to heat, not because ice cream causes drowning). Establishing causation requires controlled experiments, correct temporal ordering (cause precedes effect), and eliminating confounders.
Pearson r: for continuous data, roughly normally distributed, linear relationship. Spearman rho: for ordinal data, skewed distributions, or when the linear assumption is violated -- works on ranks rather than raw values. Spearman is more robust to outliers. If data is clearly non-normal or ordinal, use Spearman. For roughly normal continuous data with a linear relationship, use Pearson.
Minimum 3 to calculate (but meaningless). Minimum 5-10 for any real interpretation. Recommended: n ≥ 30 for reliable estimates. With n < 10, even r=0.8 may not be statistically significant. With n > 100, even r=0.2 can be statistically significant but practically negligible. Always report significance alongside r.
r=0 means no LINEAR relationship. It does NOT mean no relationship at all. X=[-3,-2,-1,0,1,2,3] and Y=[9,4,1,0,1,4,9] give r=0 because the perfect quadratic relationship is non-linear. Always plot your data before interpreting r. A scatter plot showing a U-shape, S-curve, or other non-linear pattern alongside r=0 should prompt use of non-linear correlation measures.
Outliers can dramatically inflate or deflate Pearson r because deviations are squared. A single extreme outlier can move r from 0.9 to 0.5, or even reverse its sign. Always plot your data first. If outliers are present: verify they are not data errors, consider reporting r with and without the outlier, or use Spearman rho which is robust to outliers by working on ranks rather than raw values.
Yes, via linear regression. The regression slope m = r x (Sy/Sx) where Sy, Sx are standard deviations. R-squared tells you what fraction of Y's variance is predictable from X. If r=0.8, R^2=0.64 -- 64% of Y is predictable from X. The model: Y = a + bX where b = m (the slope). Higher |r| = better predictive accuracy = lower prediction errors.
Related Calculators
Popular Calculators
🧮

Missing an Education Calculator?

Need a statistics or data analysis calculator we haven't built? Tell us and we'll build it.