Understanding quantiles

Statistics
Distributions
Author

Edoardo Costantini

Published

September 6, 2022

Quantiles, percentiles, and quartiles

Descriptive statistics are a fundamental tools for a data analysis. Their purpose is to describe and summarize data. Quantitative variables such as weight, age, and income can be described by distributions (e.g., normal distribution) with measures of center (the typical value of a variable) and variability (spread around the center). The mean is a measure of center. The standard deviation is a measure of variability.

There are special descriptive statistics that help us describe simultaneously center and spread of a variable’s distribution. These measures are often known as measures of positions. In general, they tell us the point of the distribution of a variable at which a given percentage of the data falls below or above that point. The minimum and maximum values of a variable are measures of positions defining the point at which no data, or all data, fall below it, respectively. The median is a measure of position that defines the point at which half of the data falls below it (and above it). The median is a special case of the measure of position called percentile. The p-th percentile is the point such that p% of the observations fall below or at that point and (100 - p)% fall above it. The 50-th percentile is the median.

Quartiles are commonly used percentiles. The first quartile is the 25-th percentile, the value of the variable leaving to its left 25% of the distribution. The third quartile is the 75-th percentile, the value of the variable leaving to its left 75% of the distribution. You can think of quartiles as diving the probability distribution in 4 intervals with equal probabilities (e.g., area under the probability density function). Similarly, you can think of percentiles as diving the probability distribution in 100 intervals with equal probabilities.

We use the word quantile to describe the general measure of position that divides the probability distribution in intervals with equal probabilities. Percentiles divide the probability distribution of a variable in 100 intervals. Deciles divide it in 10 intervals, and quartiles in four. As such, percentiles, deciles, quartiles, and the median are all special cases of quantiles.

TL;DR, just give me the code!

# Set a seed

set.seed(20220906)

# Sample from a normal distribution

age <- rnorm(1e5, mean = 27, sd = 2)

# Define the 1st and 2nd quartile

quartiles <- quantile(age, probs = c(.25, .75))

# Plot density distribution

plot(density(age),
     main = "Quartiles for the probability distribution of age",
     xlab = NA)

# Costum x ticks
axis(side = 1, at = c(27, round(quartiles, 1)), labels = TRUE)

# Add points for quantiles
points(c(quartiles, median(age)),
        y = rep(0, length(quartiles)+1))
points(c(quartiles, median(age)),
       y = dnorm(c(quartiles, median(age)), mean = mean(age), sd = sd(age)))

# Add segments to devide plot
segments(x0 = quartiles[1], y0 = 0,
         x1 = quartiles[1], y1 = dnorm(quartiles[1],
                                       mean = mean(age), sd = sd(age)))
segments(x0 = median(age), y0 = 0,
         x1 = median(age), y1 = max(dnorm(age,
                                       mean = mean(age), sd = sd(age))))
segments(x0 = quartiles[2], y0 = 0,
         x1 = quartiles[2], y1 = dnorm(quartiles[2],
                                       mean = mean(age), sd = sd(age)))

# Add quartile labels
text(x = quartiles[1],
     y = -.005,
     "1st quartile")
text(x = quartiles[2],
     y = -.005,
     "3rd quartile")

# Add percentage under the curve labels
text(x = c(24, 30, 26.3, 27.7),
     y = c(.03, .03, .06, .06),
     "25 %")