# Fit a linear model -----------------------------------------------------------
<- lm(mpg ~ cyl + hp + wt, data = mtcars) lm_fit
Introduction
The residual standard error is a measure of fit for linear regression models. Conceptually, it can be thought of as the variability of the prediction error for a linear model. It is usually calculated as:
\[ SE_{resid} = \sqrt{ \frac{ \sum^{n}_{i = 1}(y_i - \hat{y}_i)^2 }{df_{resid}} } \]
where:
- \(n\) is the sample size
- \(k\) is the number of parameters to estimate in the model
- \(-1\) is the degree of freedom lost to estimate the intercept
- \(\hat{y}_i\) is the fitted \(y\) value for the \(i\)-th individual
- \(df_{resid}\) is the degrees of freedom of the residuals (\(n - k - 1\))
The smaller the residual standard error, the better the model fits the data.
Learn by coding
We can compute the residual standard error manually after estimating a linear model in R. To get a better grasp of the residual standard error, let’s start by regressing the miles per gallon (mpg) on the number of cylinders (cyl), horsepower (hp), and weight (wt) of cars from the standard mtcars
R dataset.
We can compute the residual standard error following the formula described above:
# Compute the residual standard error manually ---------------------------------
# Define elements of the formula
<- nrow(mtcars) # sample size
n <- 3 # number of parameters (regression coefficients)
k <- fitted(lm_fit) # fitted y values
yhat <- mtcars$mpg
y
# Compute rse
<- sqrt(sum((y - yhat)^2) / (n - k - 1))
rse
# Print rse
rse
[1] 2.511548
We can also extract it directly from any lm
object:
# residual standard error from lm output ---------------------------------------
# Use the sigma function to extract it from an lm object
sigma(lm_fit)
[1] 2.511548
# Compare with the manual computation
sigma(lm_fit) - rse
[1] 0
TL;DR, just give me the code!
# Fit a linear model -----------------------------------------------------------
<- lm(mpg ~ cyl + hp + wt, data = mtcars)
lm_fit
# Compute the residual standard error manually ---------------------------------
# Define elements of the formula
<- nrow(mtcars) # sample size
n <- 3 # number of parameters (regression coefficients)
k <- fitted(lm_fit) # fitted y values
yhat <- mtcars$mpg
y
# Compute rse
<- sqrt(sum((y - yhat)^2) / (n - k - 1))
rse
# Print rse
rse
# residual standard error from lm output ---------------------------------------
# Use the sigma function to extract it from an lm object
sigma(lm_fit)
# Compare with the manual computation
sigma(lm_fit) - rse