Finding Modes Using Kernel Density Estimates

R python kernel-density pdf probability-density programming

Examples of finding the mode of a univeriate distribution in R and Python.

Robert M Flight
2018-07-19

TL; DR

If you have a unimodal distribution of values, you can use R’s density or Scipy’s gaussian_kde to create density estimates of the data, and then take the maxima of the density estimate to get the mode. See below for actual examples in R and Python.

Mode in R

First, lets do this in R. Need some values to work with.

library(ggplot2)
set.seed(1234)
n_point <- 1000
data_df <- data.frame(values = rnorm(n_point))

ggplot(data_df, aes(x = values)) + geom_histogram()
ggplot(data_df, aes(x = values)) + geom_density()

We can do a kernel density, which will return an object with a bunch of peices. One of these is y, which is the actual density value for each value of x that was used! So we can find the mode by querying x for the maxima in y!

density_estimate <- density(data_df$values)

mode_value <- density_estimate$x[which.max(density_estimate$y)]
mode_value
[1] -0.04599328

Plot the density estimate with the mode location.

density_df <- data.frame(value = density_estimate$x, density = density_estimate$y)

ggplot(density_df, aes(x = value, y = density)) + geom_line() + geom_vline(xintercept = mode_value, color = "red")

Python

Lets do something similar in Python. Start by generating a set of random values.

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

values = np.random.normal(size = 1000)

plt.hist(values)
(array([  5.,  19.,  70., 145., 233., 254., 173.,  86.,  10.,   5.]), array([-3.40466956, -2.73500426, -2.06533895, -1.39567364, -0.72600834,
       -0.05634303,  0.61332228,  1.28298759,  1.95265289,  2.6223182 ,
        3.29198351]), <a list of 10 Patch objects>)
plt.show()

And then use gaussian_kde to get a kernel estimator of the density, and then call the pdf method on the original values.

kernel = stats.gaussian_kde(values)
height = kernel.pdf(values)

mode_value = values[np.argmax(height)]
print(mode_value)
0.05435206135566903

Plot to show indeed we have it right. Note we sort the values first so the PDF looks right.

values2 = np.sort(values.copy())
height2 = kernel.pdf(values2)

plt.clf()
plt.cla()
plt.close()

plt.plot(values2, height2)
plt.axvline(mode_value)
plt.show()

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rmflight/researchBlog_distill, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Flight (2018, July 19). Deciphering Life: One Bit at a Time: Finding Modes Using Kernel Density Estimates. Retrieved from https://rmflight.github.io/posts/2018-07-19-finding-modes-using-kernel-density-estimates/

BibTeX citation

@misc{flight2018finding,
  author = {Flight, Robert M},
  title = {Deciphering Life: One Bit at a Time: Finding Modes Using Kernel Density Estimates},
  url = {https://rmflight.github.io/posts/2018-07-19-finding-modes-using-kernel-density-estimates/},
  year = {2018}
}