What is linear regression?

Linear regression finds the best-fit straight line (y = mx + b) through a set of data points using least-squares method. The slope (m) shows how much y changes per unit change in x, and the intercept (b) is where the line crosses the y-axis. R-squared measures how well the line fits the data.

What does R-squared mean?

R-squared measures how well the line fits the data, ranging from 0 to 1. A value of 1 means the line perfectly fits all data points (no error). R² = 0.8 means the line explains 80% of variance in y. R² close to 0 means the line doesn't explain the variation; consider non-linear or non-existent relationship.

How is linear regression different from correlation?

Correlation measures the strength of relationship (r from -1 to +1). Linear regression gives the equation of the best-fit line (slope and intercept) plus predictions. They're related: r² = R². Correlation tells you if relationship exists; regression tells you what it is.

When can I use linear regression?

When relationship between variables appears linear (plot scatter first). For non-linear relationships: use polynomial regression, log transformation, or non-linear models. Linear regression assumes linearity, normal residuals, equal variance, and independence of observations.

How accurate is linear regression?

Depends on R² and sample size. With high R² (>0.8) and large sample, predictions are reasonably accurate within data range. R² < 0.5 indicates substantial unexplained variation; predictions are uncertain. Always report prediction intervals, not just point estimates.

What if my data has outliers?

Outliers strongly affect the regression line. Investigate first: data error? legitimate extreme value? Robust regression methods (Huber, Tukey biweight) reduce outlier influence. Don't simply delete outliers without good reason. Sometimes outliers reveal important phenomena.

Can I extrapolate beyond my data?

Risky. The linear relationship might not hold outside observed range. Boundary effects, saturation, or different mechanisms may apply. For prediction within data range: relatively safe. For extrapolation: use with extreme caution, report uncertainty.

Linear Regression Calculator

Q: What if my data has outliers?

Outliers strongly affect the regression line. Investigate first: data error? legitimate extreme value? Robust regression methods (Huber, Tukey biweight) reduce outlier influence. Don't simply delete outliers without good reason. Sometimes outliers reveal important phenomena.

Q: Can I extrapolate beyond my data?

Risky. The linear relationship might not hold outside observed range. Boundary effects, saturation, or different mechanisms may apply. For prediction within data range: relatively safe. For extrapolation: use with extreme caution, report uncertainty.

Enter up to 8 paired (x, y) data points to compute the least-squares linear regression line (y = mx + b), R-squared, correlation coefficient, and predicted values.

Linear regression is the workhorse of predictive modeling and the simplest example of statistical learning. Given a set of paired observations (x, y), it finds the straight line y = mx + b that best fits the data — meaning the line that minimizes the sum of squared distances from each point to the line. The slope m tells you how much y changes per unit change in x; the intercept b is where the line crosses the y-axis when x = 0.

This calculator returns the slope, intercept, R² (coefficient of determination), and predicted y for any new x value. R² ranges from 0 (line explains nothing) to 1 (line explains everything). Real-world data rarely achieves R² = 1; values above 0.7 indicate strong linear relationships, 0.5-0.7 moderate, 0.3-0.5 weak, and below 0.3 weak or non-linear.

Linear regression assumes: linearity (relationship is actually linear), independence (observations don't influence each other), homoscedasticity (constant variance across x), and normality of residuals. Real data often violates these assumptions; check residual plots to verify. For non-linear relationships, transformations (log, square root, polynomial) often restore linearity.

Common applications include sales forecasting, scientific calibration (instrument readings vs. true values), house price prediction from square footage, height vs. weight, and any continuous-continuous prediction problem.

Inputs

X6 (0 to skip)

Y6 (0 to skip)

X7 (0 to skip)

Y7 (0 to skip)

X8 (0 to skip)

Y8 (0 to skip)

Predict Y for X =

Results

Equation

y = 1.9900x + 0.0500

Slope (m)

1.9900

Intercept (b)

0.0500

R-Squared

0.997305

Correlation (r)

0.998652

Predicted Y (x=6)

11.9900

Standard Error

0.1889

Data Points

Last updated: May 29, 2026

Formula

**Linear regression equation:** y = mx + b Where: - **m**: slope (rate of change) - **b**: y-intercept (value at x = 0) **Least-squares formulas:** m = Σ(xᵢ - x̄)(yᵢ - ȳ) / Σ(xᵢ - x̄)² b = ȳ - m × x̄ Where x̄ and ȳ are sample means. **R-squared (coefficient of determination):** R² = 1 - SSE/SST Where: - **SSE** (sum of squared errors): Σ(yᵢ - ŷᵢ)² - **SST** (total sum of squares): Σ(yᵢ - ȳ)² - **ŷᵢ**: predicted y for xᵢ **Correlation coefficient (r):** r = ± √R² Sign matches slope direction. r = 1 perfect positive, r = -1 perfect negative. **Worked example: data (1,2), (2,4), (3,6), (4,8)** Mean x = 2.5, Mean y = 5 Σ(x-x̄)(y-ȳ) = (-1.5)(-3) + (-0.5)(-1) + (0.5)(1) + (1.5)(3) = 4.5 + 0.5 + 0.5 + 4.5 = 10 Σ(x-x̄)² = 1.5² + 0.5² + 0.5² + 1.5² = 5 m = 10/5 = 2 b = 5 - 2(2.5) = 0 Equation: y = 2x. Perfect linear relationship (R² = 1). **Common applications:** | Field | Application | |---|---| | Sales | Forecasting from advertising | | Real estate | Price prediction from area | | Healthcare | Drug dose response | | Engineering | Stress-strain curves | | Finance | Asset price models | | Education | Test scores from study time | | Science | Calibration curves | **Assumptions:** 1. Linearity: relationship is linear. 2. Independence: observations not related. 3. Homoscedasticity: equal variance across x. 4. Normality: residuals normally distributed. 5. No outliers significantly affecting results. Check with: - Scatter plot. - Residual plot vs. fitted. - Q-Q plot of residuals. - Outlier detection. **Common applications:** - **Predicting house prices** from sq ft. - **Estimating sales** from advertising spend. - **Calibrating instruments** (sensor reading vs. true value). - **Drug response curves** (dose vs. effect). - **Quality control** (input vs. output). - **Educational outcomes** (study time vs. grades). - **Economic forecasting** (interest rate vs. inflation).

How to use this calculator

Enter your paired (x, y) data values.
Leave unused fields at 0 (calculator detects).
Calculator returns slope (m), intercept (b), R², and correlation.
For prediction: enter target x to get predicted y.
Check R² to assess fit quality.
For larger datasets: use Excel TREND() or LINEST(), or R lm().

Worked examples

Sales vs advertising

**Scenario:** Track advertising spend (x in $K) vs sales (y in $K) for past 5 months: (10, 50), (15, 70), (20, 85), (25, 100), (30, 120). **Calculation:** Slope ≈ 3.5, intercept ≈ 15. y = 3.5x + 15. R² = 0.99 (excellent fit). Predict sales for $35K advertising: y = 3.5(35) + 15 = $137.5K. **Result:** Each additional $1K in advertising returns ~$3.50 in sales. Strong relationship. Forecast helps budget allocation.

Height-weight relationship

**Scenario:** Heights (cm) and weights (kg) of 5 adults: (165, 60), (170, 70), (175, 75), (180, 80), (185, 90). **Calculation:** m ≈ 1.5, b ≈ -188. y = 1.5x - 188. R² ≈ 0.95. Predicts: 178 cm → 76 kg. **Result:** Strong linear relationship. Each cm of height ~1.5 kg of weight (this is a known approximation in adults). Useful for BMI estimation.

Manufacturing calibration

**Scenario:** Sensor reading (x) vs true measurement (y): (1, 1.02), (2, 1.98), (3, 3.05), (4, 3.97). **Calculation:** m ≈ 0.99, b ≈ 0.03. Essentially y = x. R² ≈ 0.999. **Result:** Sensor reads accurately with tiny bias of +0.03. Linear calibration adequate. Can use sensor reading directly with minimal correction.

When to use this calculator

**Use linear regression for:**

- **Forecasting**: predict outcomes from input variables. - **Modeling relationships**: quantify how variables relate. - **Trend analysis**: detect patterns over time. - **Calibration**: relate measurements to true values. - **Decision-making**: optimize input for target output.

**Limitations:**

- Only models linear relationships. - Sensitive to outliers. - Assumes errors normally distributed. - Correlation ≠ causation. - Extrapolation beyond data range risky.

**Non-linear data?**

Transform variables: - **Log transformation**: for exponential growth. - **Square root**: for moderate non-linearity. - **Polynomial**: for curved relationships. - **Logistic regression**: for binary outcomes.

**Multiple regression:**

For multiple input variables: y = b + m₁x₁ + m₂x₂ + ... Use specialized software.

**R-squared interpretation:**

| R² | Interpretation | |---|---| | 0.9+ | Strong fit; line explains most variation | | 0.7-0.9 | Moderate-strong fit | | 0.5-0.7 | Moderate fit; substantial unexplained variation | | 0.3-0.5 | Weak fit | | <0.3 | Very weak; consider non-linear models |

**Important caveats:**

- High R² doesn't mean line is correct (could be misspecified). - Low R² doesn't mean no relationship (could be nonlinear). - Outliers can dramatically affect line and R². - Extrapolation risky beyond data range.

**Software:**

- **Excel**: TREND(), LINEST(), SLOPE(), INTERCEPT(). - **R**: lm() function. - **Python**: scipy.stats.linregress(), sklearn.linear_model. - **SPSS**: Linear regression menu.

**Common misuses:**

- Predicting outside data range. - Assuming linear when relationship is non-linear. - Ignoring outliers. - Equating R² with significance. - Causation from correlation.

**Practical tips:**

- Always plot data before regression. - Check residual plots for patterns. - Investigate outliers (don't just delete). - Validate predictions on independent data. - Report uncertainty (prediction intervals).

Common mistakes to avoid

Predicting outside the data range. Extrapolation is risky.
Assuming linear when data shows curve. Plot first.
Ignoring outliers. They strongly influence the line.
Equating high R² with significance. Test statistical significance separately.
Concluding causation from correlation.
Forgetting to validate predictions on new data.
Not checking residuals for patterns.

Linear Regression Calculator

Inputs

Results

Formula

How to use this calculator

Worked examples

Sales vs advertising

Height-weight relationship

Manufacturing calibration

When to use this calculator

Common mistakes to avoid

Frequently Asked Questions

Sources & further reading

Related Calculators

Correlation Coefficient Calculator

Standard Deviation Calculator

Mean, Median, Mode Calculator