What does the correlation coefficient tell you?

It measures the strength and direction of a linear relationship between two variables. Values near +1 indicate strong positive correlation (both increase together); near -1 strong negative (one decreases as other increases); near 0 weak or no linear relationship. r = 0 doesn't mean no relationship — could be non-linear.

Does correlation prove causation?

No. Correlation indicates association but not direction or causation. Strong correlation could result from: A causes B, B causes A, both caused by C (lurking variable), or coincidence. Establishing causation requires controlled experiments or careful causal inference methods.

When should I use Pearson vs Spearman?

Pearson: linear relationships, continuous data, normal distributions. Spearman: monotonic (consistent direction) relationships, ordinal data, robust to outliers. For non-monotonic non-linear: visualize data and consider transformations or non-parametric tests.

How is r affected by outliers?

Strongly. A single outlier can dramatically change r. For example, the same dataset could have r = 0.5 with all data or r = 0.9 if one outlier is included. Always investigate outliers; use robust methods (Spearman) if outliers are legitimate but problematic.

What's the difference between correlation and regression?

Correlation measures relationship strength (r). Regression gives the equation of best-fit line (slope, intercept) and predicts y from x. They're related: r² = R². Correlation says relationship exists; regression says what it is and predicts.

Can r be greater than 1?

No. By definition, -1 ≤ r ≤ +1. Any calculation giving |r| > 1 indicates an error. Software should always report between -1 and 1. r = 1 means perfect positive linear correlation; r = -1 perfect negative.

Correlation Coefficient Calculator

Q: What is R-squared?

R-squared is the square of the correlation coefficient (R² = r²). It represents the proportion of variance in one variable explained by the other. An R² of 0.81 means 81% of variance is explained by the linear relationship. R² ranges from 0 to 1.

Enter up to 8 paired (x, y) data points to calculate the Pearson correlation coefficient. Values range from -1 (perfect negative) to +1 (perfect positive). Also shows the coefficient of determination (R-squared).

The Pearson correlation coefficient (r) measures the strength and direction of a linear relationship between two variables. Values range from -1 (perfect negative correlation) through 0 (no linear correlation) to +1 (perfect positive correlation). The further from 0, the stronger the relationship. The famous "correlation is not causation" caveat applies — a strong r doesn't prove one variable causes the other.

This calculator returns the Pearson r and R² (coefficient of determination, the proportion of variance explained). R² = r² always. Both metrics describe how well a linear relationship explains the data.

Pearson r assumes linearity. For non-linear monotonic relationships, use Spearman's rank correlation. For categorical data, use chi-square. For binary outcomes, use point-biserial correlation. The wrong correlation measure on inappropriate data can be misleading.

Common applications: assessing whether two variables move together (height-weight), evaluating predictor strength in regression, testing scientific hypotheses about relationships, identifying redundant variables in datasets, and exploratory data analysis. Always plot the data first; correlation alone can mislead in cases of non-linearity, outliers, or restricted ranges.

Inputs

X6 (0 to skip)

Y6 (0 to skip)

X7 (0 to skip)

Y7 (0 to skip)

X8 (0 to skip)

Y8 (0 to skip)

Results

Correlation (r)

0.774597

R-Squared

0.600000

Strength

Strong

Direction

Positive

Data Points

Regression Line

y = 0.6000x + 2.2000

Last updated: May 29, 2026

Formula

**Pearson correlation coefficient:** r = Σ((xᵢ - x̄)(yᵢ - ȳ)) / √(Σ(xᵢ - x̄)² × Σ(yᵢ - ȳ)²) Where: - **x̄**, **ȳ**: sample means - **Σ**: sum across all observations **Properties:** - **Range**: -1 ≤ r ≤ +1 - **No units**: dimensionless - **Symmetric**: r(X, Y) = r(Y, X) - **Scale invariant**: r unaffected by linear transformations - **Sign**: indicates direction of linear relationship **R² (coefficient of determination):** R² = r² (always) Interpretation: proportion of variance in one variable explained by the other. **Worked example: (1,2), (2,4), (3,5), (4,4), (5,5)** Means: x̄ = 3, ȳ = 4 Sum of (x-x̄)(y-ȳ) = (-2)(-2)+(-1)(0)+(0)(1)+(1)(0)+(2)(1) = 4 Sum of (x-x̄)² = 10 Sum of (y-ȳ)² = 6 r = 4/√(60) = 0.516 R² = 0.27 (27% of variance explained). **Strength interpretation:** | |r| | Strength | Description | |---|---|---| | 0.0 - 0.1 | Negligible | Essentially no linear relationship | | 0.1 - 0.3 | Weak | Small relationship | | 0.3 - 0.5 | Moderate | Detectable relationship | | 0.5 - 0.7 | Strong | Substantial relationship | | 0.7 - 0.9 | Very strong | Important relationship | | 0.9 - 1.0 | Near perfect | Excellent linear fit | **Sign interpretation:** - **r > 0**: positive correlation. As x increases, y tends to increase. - **r < 0**: negative correlation. As x increases, y tends to decrease. - **r ≈ 0**: no linear relationship (could still have non-linear). **Common applications:** - **Test scores** vs **study time**: positive correlation typical. - **Age** vs **reflex time**: positive (reflex slows with age). - **Income** vs **happiness**: positive but weak. - **Sunscreen use** vs **sunburn**: negative correlation. - **Air pressure** vs **altitude**: strong negative. - **GDP** vs **inflation**: complex; varies by economy. **Correlation vs causation:** ❌ "Ice cream sales and shark attacks are correlated, so ice cream causes shark attacks." ✓ Both depend on summer weather (lurking variable). ❌ "Smoking correlates with lung cancer; smoking causes lung cancer." ✓ While true, correlation alone doesn't prove causation — controlled studies do. **Statistical significance of r:** Test H₀: r = 0 (no relationship). t = r × √((n-2) / (1-r²)) Compare to t-distribution with n-2 degrees of freedom. **Limitations:** - Only detects linear relationships. - Affected by outliers. - Restricted range affects r. - Doesn't imply causation. - Strong relationships can be missed if non-linear. **Spearman's rank correlation:** For monotonic (consistent direction) relationships: ρ = 1 - (6 × Σdᵢ²) / (n × (n²-1)) Where dᵢ is rank difference. Robust to outliers; works for ordinal data. **Tests for correlation:** | Test | Use | |---|---| | Pearson r | Linear, continuous | | Spearman ρ | Monotonic, ordinal/continuous | | Kendall τ | Robust ordinal | | Point-biserial | One binary, one continuous | | Phi/Cramer's V | Categorical | **Anscombe's quartet:** Four datasets with nearly identical r and regression line, but very different scatter patterns. Demonstrates that summary statistics can be misleading; always plot data. **Real-world examples:** | Relationship | Typical r | |---|---| | Height and weight (adults) | 0.7-0.8 | | Twin IQ correlation | 0.85-0.95 | | Father-son height | 0.5-0.6 | | GDP and life expectancy | 0.6-0.8 | | Spousal age correlation | 0.85-0.95 | | Daily temperature and ice cream sales | 0.4-0.7 | | Stock and S&P 500 (high beta) | 0.5-0.9 |

How to use this calculator

Enter paired (x, y) data values.
Leave unused fields at 0.
Calculator returns Pearson r and R².
Always plot data to verify linear relationship.
For non-linear data: use Spearman or other tests.
r = 0 doesn't mean no relationship; could be non-linear.

Worked examples

Studying time vs grades

**Scenario:** 5 students: (study hours, exam grade) = (2, 65), (4, 75), (6, 82), (8, 88), (10, 95). **Calculation:** r ≈ 0.99. R² = 0.98. **Result:** Very strong positive correlation. 98% of grade variance explained by study hours. Strong linear relationship; predict grades from study time within data range.

Random correlation

**Scenario:** 5 random pairs: (1, 5), (3, 2), (5, 8), (7, 1), (9, 4). **Calculation:** r ≈ -0.1. R² ≈ 0.01. **Result:** Essentially no correlation. Variables not linearly related; might be random. Be wary of any predictive claims based on this relationship.

Perfect negative correlation

**Scenario:** Temperature vs. heating cost over 5 days: (5°C, $100), (10°C, $80), (15°C, $60), (20°C, $40), (25°C, $20). **Calculation:** r = -1.0 (perfect negative). **Result:** Perfect negative linear correlation. As temperature increases by 5°C, heating cost decreases by $20. Strong, predictable relationship useful for billing forecasts.

When to use this calculator

**Use Pearson correlation for:**

- **Quantifying linear relationships** between continuous variables. - **Initial exploration** of two-variable relationships. - **Validation of regression models**. - **Identifying redundant variables** in feature selection. - **Hypothesis testing** about relationships.

**Common applications:**

- **Education research**: study time vs grades. - **Healthcare**: drug dose vs effect. - **Economics**: variables in macroeconomic models. - **Psychology**: trait correlations. - **Marketing**: campaign metrics correlations.

**When NOT to use Pearson:**

- **Categorical data**: use chi-square or other measures. - **Non-linear relationships**: use Spearman or visualize differently. - **Outliers strongly influence**: use Spearman or robust methods. - **Restricted range**: r underestimates strength.

**Steps for correlation analysis:**

1. **Plot data**: scatter plot of x vs y. 2. **Check assumptions**: linearity, no extreme outliers. 3. **Calculate r**: Pearson if linear, Spearman if monotonic. 4. **Test significance**: confidence interval or p-value. 5. **Interpret**: strength + direction + practical importance. 6. **Report**: with sample size, visualization, alternative interpretations.

**Statistical significance:**

t = r × √((n-2)/(1-r²)), df = n-2

For n = 20, |r| > 0.444 is significant at p = 0.05. For n = 30, |r| > 0.361. For n = 100, |r| > 0.197.

Even tiny r becomes significant with very large n.

**Confidence intervals for r:**

Use Fisher's z-transformation: z = 0.5 × ln((1+r)/(1-r)) SE(z) = 1/√(n-3)

Then transform back to r.

**Common errors:**

- Confusing correlation with causation. - Reporting r without sample size. - Equating high r with strong relationship without checking. - Using Pearson on non-linear data. - Drawing conclusions from small samples.

**Anscombe's quartet warning:**

Four datasets with nearly identical r ≈ 0.82, but very different scatter patterns: 1. Linear relationship with normal noise. 2. Non-linear (curved) relationship. 3. Linear relationship with one outlier. 4. Vertical points + outlier.

Same r, drastically different real relationships. Always visualize.

**Effect size and correlation:**

Cohen's r-based effect size: - 0.1: small effect. - 0.3: medium effect. - 0.5: large effect.

These are guidelines; context matters.

**Beyond Pearson:**

- **Spearman rank**: monotonic non-linear. - **Kendall tau**: more robust ordinal. - **Partial correlation**: controlling for other variables. - **Multiple correlation**: relationships among many variables. - **Canonical correlation**: between sets of variables.

**Common interpretations:**

| Field | Typical r range | Notes | |---|---|---| | Physics laws | 0.9+ | Strict deterministic relationships | | Engineering specs | 0.8-0.95 | High accuracy expected | | Medical studies | 0.5-0.85 | Biological variability | | Education | 0.3-0.7 | Many factors at play | | Social science | 0.2-0.5 | Inherent variability | | Psychology | 0.1-0.4 | High individual variation |

Common mistakes to avoid

Confusing correlation with causation. They're different.
Using Pearson for non-linear or categorical data.
Treating r without checking sample size.
Forgetting outliers can dramatically affect r.
Confusing r with R² in interpretation.
Ignoring restriction of range issues.
Reporting r without context or visualization.

Correlation Coefficient Calculator

Inputs

Results

Formula

How to use this calculator

Worked examples

Studying time vs grades

Random correlation

Perfect negative correlation

When to use this calculator

Common mistakes to avoid

Frequently Asked Questions

Sources & further reading

Related Calculators

Standard Deviation Calculator

Mean, Median, Mode Calculator