Linear Regression Calculator
Enter up to 8 paired (x, y) data points to compute the least-squares linear regression line (y = mx + b), R-squared, correlation coefficient, and predicted values.
Linear regression is the workhorse of predictive modeling and the simplest example of statistical learning. Given a set of paired observations (x, y), it finds the straight line y = mx + b that best fits the data — meaning the line that minimizes the sum of squared distances from each point to the line. The slope m tells you how much y changes per unit change in x; the intercept b is where the line crosses the y-axis when x = 0.
This calculator returns the slope, intercept, R² (coefficient of determination), and predicted y for any new x value. R² ranges from 0 (line explains nothing) to 1 (line explains everything). Real-world data rarely achieves R² = 1; values above 0.7 indicate strong linear relationships, 0.5-0.7 moderate, 0.3-0.5 weak, and below 0.3 weak or non-linear.
Linear regression assumes: linearity (relationship is actually linear), independence (observations don't influence each other), homoscedasticity (constant variance across x), and normality of residuals. Real data often violates these assumptions; check residual plots to verify. For non-linear relationships, transformations (log, square root, polynomial) often restore linearity.
Common applications include sales forecasting, scientific calibration (instrument readings vs. true values), house price prediction from square footage, height vs. weight, and any continuous-continuous prediction problem.
Inputs
Results
Equation
y = 1.9900x + 0.0500
Slope (m)
1.9900
Intercept (b)
0.0500
R-Squared
0.997305
Correlation (r)
0.998652
Predicted Y (x=6)
11.9900
Standard Error
0.1889
Data Points
5
Formula
How to use this calculator
- Enter your paired (x, y) data values.
- Leave unused fields at 0 (calculator detects).
- Calculator returns slope (m), intercept (b), R², and correlation.
- For prediction: enter target x to get predicted y.
- Check R² to assess fit quality.
- For larger datasets: use Excel TREND() or LINEST(), or R lm().
Worked examples
Sales vs advertising
**Scenario:** Track advertising spend (x in $K) vs sales (y in $K) for past 5 months: (10, 50), (15, 70), (20, 85), (25, 100), (30, 120). **Calculation:** Slope ≈ 3.5, intercept ≈ 15. y = 3.5x + 15. R² = 0.99 (excellent fit). Predict sales for $35K advertising: y = 3.5(35) + 15 = $137.5K. **Result:** Each additional $1K in advertising returns ~$3.50 in sales. Strong relationship. Forecast helps budget allocation.
Height-weight relationship
**Scenario:** Heights (cm) and weights (kg) of 5 adults: (165, 60), (170, 70), (175, 75), (180, 80), (185, 90). **Calculation:** m ≈ 1.5, b ≈ -188. y = 1.5x - 188. R² ≈ 0.95. Predicts: 178 cm → 76 kg. **Result:** Strong linear relationship. Each cm of height ~1.5 kg of weight (this is a known approximation in adults). Useful for BMI estimation.
Manufacturing calibration
**Scenario:** Sensor reading (x) vs true measurement (y): (1, 1.02), (2, 1.98), (3, 3.05), (4, 3.97). **Calculation:** m ≈ 0.99, b ≈ 0.03. Essentially y = x. R² ≈ 0.999. **Result:** Sensor reads accurately with tiny bias of +0.03. Linear calibration adequate. Can use sensor reading directly with minimal correction.
When to use this calculator
**Use linear regression for:**
- **Forecasting**: predict outcomes from input variables. - **Modeling relationships**: quantify how variables relate. - **Trend analysis**: detect patterns over time. - **Calibration**: relate measurements to true values. - **Decision-making**: optimize input for target output.
**Limitations:**
- Only models linear relationships. - Sensitive to outliers. - Assumes errors normally distributed. - Correlation ≠ causation. - Extrapolation beyond data range risky.
**Non-linear data?**
Transform variables: - **Log transformation**: for exponential growth. - **Square root**: for moderate non-linearity. - **Polynomial**: for curved relationships. - **Logistic regression**: for binary outcomes.
**Multiple regression:**
For multiple input variables: y = b + m₁x₁ + m₂x₂ + ... Use specialized software.
**R-squared interpretation:**
| R² | Interpretation | |---|---| | 0.9+ | Strong fit; line explains most variation | | 0.7-0.9 | Moderate-strong fit | | 0.5-0.7 | Moderate fit; substantial unexplained variation | | 0.3-0.5 | Weak fit | | <0.3 | Very weak; consider non-linear models |
**Important caveats:**
- High R² doesn't mean line is correct (could be misspecified). - Low R² doesn't mean no relationship (could be nonlinear). - Outliers can dramatically affect line and R². - Extrapolation risky beyond data range.
**Software:**
- **Excel**: TREND(), LINEST(), SLOPE(), INTERCEPT(). - **R**: lm() function. - **Python**: scipy.stats.linregress(), sklearn.linear_model. - **SPSS**: Linear regression menu.
**Common misuses:**
- Predicting outside data range. - Assuming linear when relationship is non-linear. - Ignoring outliers. - Equating R² with significance. - Causation from correlation.
**Practical tips:**
- Always plot data before regression. - Check residual plots for patterns. - Investigate outliers (don't just delete). - Validate predictions on independent data. - Report uncertainty (prediction intervals).
Common mistakes to avoid
- Predicting outside the data range. Extrapolation is risky.
- Assuming linear when data shows curve. Plot first.
- Ignoring outliers. They strongly influence the line.
- Equating high R² with significance. Test statistical significance separately.
- Concluding causation from correlation.
- Forgetting to validate predictions on new data.
- Not checking residuals for patterns.