Edoardo Costantini - Understanding the residual standard error

Introduction

The residual standard error is a measure of fit for linear regression models. Conceptually, it can be thought of as the variability of the prediction error for a linear model. It is usually calculated as:

\[ SE_{resid} = \sqrt{ \frac{ \sum^{n}_{i = 1}(y_i - \hat{y}_i)^2 }{df_{resid}} } \]

where:

\(n\) is the sample size
\(k\) is the number of parameters to estimate in the model
\(-1\) is the degree of freedom lost to estimate the intercept
\(\hat{y}_i\) is the fitted \(y\) value for the \(i\)-th individual
\(df_{resid}\) is the degrees of freedom of the residuals (\(n - k - 1\))

The smaller the residual standard error, the better the model fits the data.

Learn by coding

We can compute the residual standard error manually after estimating a linear model in R. To get a better grasp of the residual standard error, let’s start by regressing the miles per gallon (mpg) on the number of cylinders (cyl), horsepower (hp), and weight (wt) of cars from the standard mtcars R dataset.

# Fit a linear model -----------------------------------------------------------

  lm_fit <- lm(mpg ~ cyl + hp + wt, data = mtcars)

We can compute the residual standard error following the formula described above:

# Compute the residual standard error manually ---------------------------------

  # Define elements of the formula
  n <- nrow(mtcars) # sample size
  k <- 3            # number of parameters (regression coefficients)
  yhat <- fitted(lm_fit) # fitted y values
  y <- mtcars$mpg

  # Compute rse
  rse <- sqrt(sum((y - yhat)^2) / (n - k - 1))

  # Print rse
  rse

[1] 2.511548

We can also extract it directly from any lm object:

# residual standard error from lm output ---------------------------------------

  # Use the sigma function to extract it from an lm object
  sigma(lm_fit)

[1] 2.511548

  # Compare with the manual computation
  sigma(lm_fit) - rse

[1] 0

TL;DR, just give me the code!

# Fit a linear model -----------------------------------------------------------

  lm_fit <- lm(mpg ~ cyl + hp + wt, data = mtcars)

# Compute the residual standard error manually ---------------------------------

  # Define elements of the formula
  n <- nrow(mtcars) # sample size
  k <- 3            # number of parameters (regression coefficients)
  yhat <- fitted(lm_fit) # fitted y values
  y <- mtcars$mpg

  # Compute rse
  rse <- sqrt(sum((y - yhat)^2) / (n - k - 1))

  # Print rse
  rse

# residual standard error from lm output ---------------------------------------

  # Use the sigma function to extract it from an lm object
  sigma(lm_fit)

  # Compare with the manual computation
  sigma(lm_fit) - rse

Introduction

Learn by coding

TL;DR, just give me the code!

Other resources