Question 1

What is the correlation coefficient?

Accepted Answer

The correlation coefficient (Pearson r) measures the strength and direction of the linear relationship between two variables. It ranges from -1 to +1. r = +1 means a perfect positive linear relationship (as X increases, Y increases proportionally). r = -1 means a perfect negative linear relationship (as X increases, Y decreases proportionally). r = 0 means no linear relationship. Values between these extremes indicate the degree of linear association.

Question 2

What is the formula for Pearson r?

Accepted Answer

Pearson r = Sum((xi - x_mean)(yi - y_mean)) / sqrt(Sum((xi - x_mean)^2) x Sum((yi - y_mean)^2)). In simplified form: r = (n x Sum(xy) - Sum(x) x Sum(y)) / sqrt((n x Sum(x^2) - Sum(x)^2) x (n x Sum(y^2) - Sum(y)^2)). Both formulas give identical results. The numerator measures how much X and Y vary together (covariance). The denominator normalizes by the product of their individual standard deviations.

Question 3

How do you interpret the correlation coefficient?

Accepted Answer

Absolute value of r indicates strength: 0.00-0.19: negligible or no correlation. 0.20-0.39: weak correlation. 0.40-0.59: moderate correlation. 0.60-0.79: strong correlation. 0.80-1.00: very strong correlation. The sign indicates direction: positive r means both variables move in the same direction; negative r means they move in opposite directions. These thresholds are general guidelines and context-dependent -- a correlation of 0.5 might be considered strong in social science and weak in physics.

Question 4

What is R-squared (coefficient of determination)?

Accepted Answer

R-squared (R2) = r^2, expressed as a percentage. It represents the proportion of variance in Y that is explained by X. If r = 0.8, then R2 = 0.64 = 64% -- 64% of the variation in Y can be explained by the linear relationship with X; 36% is due to other factors. R2 ranges from 0 to 1 (0% to 100%). Higher R2 means more predictive power. R2 is always non-negative, unlike r which can be negative.

Question 5

How do you test if a correlation is statistically significant?

Accepted Answer

Use the t-test: t = r x sqrt(n-2) / sqrt(1 - r^2), with degrees of freedom = n - 2. Compare the t-statistic to critical values from the t-distribution at your chosen significance level (typically p < 0.05). A larger |t| value indicates stronger evidence against the null hypothesis (no correlation). For n = 6 and r = 0.53: t = 0.53 x sqrt(4) / sqrt(1 - 0.28) = 1.06 / 0.849 = 1.25. This would not be significant at p < 0.05 for df = 4 (critical t ≈ 2.78).

Question 6

What does a negative correlation mean?

Accepted Answer

A negative correlation means that as one variable increases, the other tends to decrease. Examples: as exercise increases, body weight tends to decrease (r ≈ -0.5). As study time decreases, error rate tends to increase (r ≈ -0.7). As temperature increases, heating costs decrease (r ≈ -0.9). The strength interpretation is the same as for positive correlations -- only the sign (direction) changes. r = -0.8 indicates a very strong negative linear relationship.

Question 7

What is the difference between correlation and causation?

Accepted Answer

Correlation measures statistical association between two variables -- it tells you whether they tend to move together. Causation means one variable directly causes changes in the other. Correlation does not imply causation. Classic example: ice cream sales and drowning deaths are positively correlated (r ≈ 0.9) because both are caused by a third variable (hot weather). Establishing causation requires controlled experiments, temporal ordering (cause precedes effect), and ruling out confounders. Correlation is a necessary but not sufficient condition for causation.

Question 8

What is a Pearson correlation vs Spearman correlation?

Accepted Answer

Pearson r measures linear association between two continuous variables and assumes both variables are normally distributed. Spearman's rho measures monotonic association (not necessarily linear) and works on ranked data, making it appropriate for ordinal data or when the linear assumption is violated. Use Pearson when: data is continuous, roughly normally distributed, and the relationship appears linear. Use Spearman when: data is ordinal, skewed, or contains outliers. For the same dataset, Pearson r and Spearman rho are often similar but can differ when the relationship is non-linear or outliers are present.

Question 9

How many data points do I need for a meaningful correlation?

Accepted Answer

At minimum 5-10 pairs are needed for any meaningful interpretation. With fewer than 5 pairs, even r = 0.9 may not be statistically significant and could be spurious. The standard guideline is n greater than or equal to 30 for reliable correlation estimates. With small samples (n < 20), always report significance (p-value or t-statistic) alongside r, as high correlations can arise by chance. With large samples (n > 100), even very small r values (0.2) can be statistically significant, though practically meaningless.

Question 10

What is a spurious correlation?

Accepted Answer

A spurious correlation is a statistically significant correlation that does not represent a meaningful causal relationship. It arises from: (1) Confounding variables -- both X and Y are caused by a third variable Z. (2) Coincidence -- in large datasets, unrelated variables can correlate by chance. (3) Time series without differencing -- many time series trend upward together simply because they track the same time period, not because they are related. Always consider whether a correlation makes theoretical sense and whether confounding variables could explain the association.

Question 11

Can correlation be used to predict values?

Accepted Answer

Yes, through linear regression. Once r is calculated, a regression line (y = mx + b) can be fitted to predict Y from X. The slope m = r x (Sy / Sx) where Sy and Sx are the standard deviations of Y and X. R-squared tells you how much of the variation in Y is predictable from X. If r = 0.8, then R2 = 0.64, meaning the regression model explains 64% of Y's variation. The remaining 36% is unexplained variance (residuals). Higher |r| = better predictive model.

Question 12

What does r = 0 mean exactly?

Accepted Answer

r = 0 means there is no linear relationship between X and Y. However, it does not mean no relationship at all -- X and Y could have a perfect non-linear relationship (e.g., U-shaped or circular) and still produce r = 0. For example, X = [-3, -2, -1, 0, 1, 2, 3] and Y = [9, 4, 1, 0, 1, 4, 9] have r = 0 because the relationship is perfectly quadratic (not linear), even though Y is completely determined by X. Always plot your data before interpreting r.

Question 13

How do outliers affect correlation?

Accepted Answer

Outliers can dramatically inflate or deflate the Pearson r because the formula squares deviations. A single outlier far from the regression line can pull r toward 0 or even reverse its sign. For example, a dataset with r = 0.9 might drop to r = 0.5 after adding one outlier. This is why plotting the data (scatter plot) before calculating r is essential. If outliers are present and not data errors, consider: removing justifiably erroneous values, using Spearman rho (rank-based, robust to outliers), or reporting results with and without the outlier.

\|r\| Value	Strength	Example Real-World Correlation
0.00 – 0.19	Negligible	Shoe size vs. IQ scores
0.20 – 0.39	Weak	Height vs. academic performance
0.40 – 0.59	Moderate	Advertising spend vs. sales revenue
0.60 – 0.79	Strong	Study hours vs. exam score
0.80 – 0.99	Very Strong	Temperature vs. ice cream sales
1.00	Perfect	Only in theory or identical variables

Correlation Coefficient Calculator 2026

Sources & Methodology

Correlation Coefficient (Pearson r) — Complete 2026 Guide

The Pearson r Formula — Step by Step

Correlation Strength Classification

R-Squared: The Coefficient of Determination

Statistical Significance: The t-Test for r

Missing an Education Calculator?