Compute minimum, Q1, median, Q3, maximum, and identify outliers in your dataset.
The five number summary is a fundamental descriptive statistic that provides a quick overview of a dataset's distribution with just five key values: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. These five numbers capture the essential characteristics of data, allowing visualization through box plots and enabling rapid comparison across multiple datasets. The summary is robust because the median and quartiles are resistant to extreme outliers, unlike the mean.
Originally developed by statistician John Tukey, the five number summary forms the foundation of exploratory data analysis. By dividing data into four equal parts (quartiles), it reveals the spread, skewness, and concentration of values. The interquartile range (IQR), calculated as Q3 − Q1, represents where the middle 50% of data lies. Values beyond the "fences" (Q1 − 1.5×IQR and Q3 + 1.5×IQR) are flagged as potential outliers, often indicating unusual or extreme observations worth investigating.
This summary is particularly useful in quality control, medical research, and any field requiring rapid data exploration. It requires no assumptions about data distribution and works equally well for symmetric and skewed datasets.
Consider test scores: 7, 15, 36, 39, 40, 41 (already sorted)
Data: 7, 15, 36, 39, 40, 41 (n = 6)
• Min = 7 (smallest value)
• Q1 = 15th + 0.25×(36−15) = 20.25 (between 15 and 36)
• Median = (36 + 39) / 2 = 37.5 (average of middle two values)
• Q3 = 39 + 0.75×(41−39) = 40.5 (between 39 and 41)
• Max = 41 (largest value)
• IQR = 40.5 − 20.25 = 20.25
• Lower fence = 20.25 − 1.5×20.25 = −10.125
• Upper fence = 40.5 + 1.5×20.25 = 71.875
• Outliers: None (all values between −10.125 and 71.875)
Result: The five number summary shows a relatively symmetric distribution with no extreme outliers. The score of 7 is the lowest outlier in this small sample relative to typical distribution patterns.
What's the difference between quartiles and percentiles?
Quartiles divide data into 4 equal parts: Q1 is 25th percentile, Q2 (median) is 50th percentile, Q3 is 75th percentile. Percentiles are more general—any value from 0 to 100.
How is the median calculated?
For odd n: median is the middle value. For even n: median is the average of the two middle values. Always sort data first.
Why use quartiles instead of mean and standard deviation?
Quartiles are resistant to outliers. Mean and SD can be heavily skewed by extreme values, while quartiles remain stable and give clearer picture of typical values.
What does IQR measure?
IQR (Interquartile Range) measures the spread of the middle 50% of data. Larger IQR means more variability; smaller IQR means data is tightly clustered.
How do you determine outliers?
The 1.5×IQR rule: values below Q1 − 1.5×IQR or above Q3 + 1.5×IQR are potential outliers. This method identifies approximately 0.7% of data as outliers in a normal distribution.
Can the five number summary be drawn as a picture?
Yes! The five number summary is the basis for box plots. The box spans Q1 to Q3, a line inside shows median, whiskers extend to min/max, and outliers plot separately.
What if data has an even number of values?
For even n: Q1 is median of lower half, Q3 is median of upper half, median is average of two middle values.
Is the five number summary useful for normally distributed data?
Yes, equally useful. For normal data, the five number summary will show symmetric spacing around the median. Asymmetry hints at skewed data.
Related Tools