Detect outliers using IQR or Z-score methods. Identify extreme values that deviate significantly from your dataset.
Last updated: March 2026
Outliers are data points that deviate significantly from other observations in a dataset. They can arise from measurement errors, data entry mistakes, experimental errors, or genuine extreme values that represent rare but real phenomena.
The IQR (Interquartile Range) method is robust and distribution-free. It defines outliers as values beyond Q1 - 1.5×IQR or Q3 + 1.5×IQR, where Q1 is the 25th percentile, Q3 is the 75th percentile, and IQR = Q3 - Q1. This method is preferred when distribution is unknown or skewed.
The Z-score method flags points with |z| > threshold (typically 2 or 3) as outliers, where z = (x - μ)/σ. This assumes approximate normality and is sensitive to the presence of outliers themselves. A z-score of 2 captures ~95% of data under normal distribution.
Dataset with Potential Outlier
IQR Method Analysis:
Z-Score Method Analysis (threshold = 2):
Interpretation:
Both methods identify 50 as an outlier - it's dramatically higher than the rest of the data. This could be a data entry error (maybe 5.0 was mistyped as 50), a measurement error, or a genuine extreme value. Further investigation is needed before deciding whether to keep or remove it.
No! First investigate the cause. Data entry error? Remove. Measurement error? Fix or remove. Genuine extreme value? Keep it - it may contain important information. Removing outliers can bias results and hide real phenomena. When in doubt, report results both with and without outliers.
IQR method: robust, distribution-free, standard practice for exploratory analysis. Z-score method: assumes normality, good for quick flagging of extremes. If distribution is unknown or skewed, use IQR. If data is approximately normal and you want standardized thresholds, use Z-score.
Derived from normal distribution properties but works empirically across many distributions. Points beyond Q1-1.5×IQR or Q3+1.5×IQR are considered 'mild outliers.' For stricter detection, use 3.0×IQR for 'extreme outliers.' The 1.5 multiplier is convention based on Tukey's fences.
Yes! IQR: use 2.0×IQR or 2.2×IQR for stricter detection (fewer outliers flagged). Z-score: z=1.5 is lenient, z=2 is standard, z=3 is strict (99.7% coverage). More conservative thresholds reduce false positives but may miss genuine outliers.
Could indicate: (1) Heavy-tailed distribution (not outliers, just wide spread), (2) Multiple subpopulations mixed together, (3) Data collection issues, (4) Incorrect method choice. Check histogram and Q-Q plot. Consider transforming data (log, sqrt) if skewed.
Good! Means your data is relatively homogeneous within the detection thresholds. This doesn't mean the data is perfect - there could still be subtle issues. Always inspect histograms and summary statistics beyond just outlier detection.
Mean: very sensitive, pulls toward outliers. Median: robust, unaffected. Standard deviation: inflated by outliers. IQR: robust. Correlation: sensitive. Regression: can dominate fit. That's why robust methods (median, IQR) are preferred when outliers present.
Use robust statistics that downweight rather than remove: median instead of mean, MAD instead of SD, robust regression (Huber, RANSAC). Winsorize (cap extreme values at percentiles). Transform data (log, Box-Cox) to reduce skew. Report sensitivity analysis with/without outliers.
Related Tools