Finding Modes Using Kernel Density Estimates

Examples of finding the mode of a univeriate distribution in R and Python.

R
python
kernel-density
pdf
probability-density
programming
Author

Robert M Flight

Published

July 19, 2018

TL; DR

If you have a unimodal distribution of values, you can use R’s density or Scipy’s gaussian_kde to create density estimates of the data, and then take the maxima of the density estimate to get the mode. See below for actual examples in R and Python.

Mode in R

First, lets do this in R. Need some values to work with.

library(ggplot2)
set.seed(1234)
n_point <- 1000
data_df <- data.frame(values = rnorm(n_point))

ggplot(data_df, aes(x = values)) + geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data_df, aes(x = values)) + geom_density()

We can do a kernel density, which will return an object with a bunch of peices. One of these is y, which is the actual density value for each value of x that was used! So we can find the mode by querying x for the maxima in y!

density_estimate <- density(data_df$values)

mode_value <- density_estimate$x[which.max(density_estimate$y)]
mode_value
[1] -0.04599328

Plot the density estimate with the mode location.

density_df <- data.frame(value = density_estimate$x, density = density_estimate$y)

ggplot(density_df, aes(x = value, y = density)) + geom_line() + geom_vline(xintercept = mode_value, color = "red")

Python

Lets do something similar in Python. Start by generating a set of random values.

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

values = np.random.normal(size = 1000)

plt.hist(values)
(array([  1.,   7.,  44., 105., 217., 296., 207.,  87.,  27.,   9.]), array([-3.80368902, -3.10698209, -2.41027516, -1.71356823, -1.01686129,
       -0.32015436,  0.37655257,  1.07325951,  1.76996644,  2.46667337,
        3.16338031]), <BarContainer object of 10 artists>)
plt.show()

And then use gaussian_kde to get a kernel estimator of the density, and then call the pdf method on the original values.

kernel = stats.gaussian_kde(values)
height = kernel.pdf(values)

mode_value = values[np.argmax(height)]
print(mode_value)
-0.08045792953113866

Plot to show indeed we have it right. Note we sort the values first so the PDF looks right.

values2 = np.sort(values.copy())
height2 = kernel.pdf(values2)

plt.clf()
plt.cla()
plt.close()
<string>:1: MatplotlibDeprecationWarning: The close_event function was deprecated in Matplotlib 3.6 and will be removed two minor releases later. Use callbacks.process('close_event', CloseEvent(...)) instead.
plt.plot(values2, height2)
plt.axvline(mode_value)
plt.show()

Reuse

Citation

BibTeX citation:
@online{mflight2018,
  author = {Robert M Flight},
  title = {Finding {Modes} {Using} {Kernel} {Density} {Estimates}},
  date = {2018-07-19},
  url = {https://rmflight.github.io/posts/2018-07-19-finding-modes-using-kernel-density-estimates},
  langid = {en}
}
For attribution, please cite this work as:
Robert M Flight. 2018. “Finding Modes Using Kernel Density Estimates.” July 19, 2018. https://rmflight.github.io/posts/2018-07-19-finding-modes-using-kernel-density-estimates.