Statistical distributions

Understanding statistical distributions is key when working with data, as they help describe how values behave in a dataset. Different distributions are useful for modelling different types of real-world scenarios. This guide provides a brief overview of some of the most commonly used statistical distributions, along with their key properties and how to generate random samples from them using Python.


Contents:

  1. Normal
  2. Uniform
  3. Bernoulli
  4. Binomial
  5. Poisson

Normal distribution

The normal distribution, also called the Gaussian distribution, is symmetric and bell-shaped, commonly used to model natural phenomena where values cluster around a central mean with a given spread.

Type:Continuous
Parameters:\( \mu \) (mean), \( \sigma^2 \) (variance)
PDF:\( f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \)
E(X):\( \mu \)
Var(X):\( \sigma^2 \)

Probability Density Function of a Normal Distribution.

Draw a random sample:

from scipy.stats import norm

sample = norm.rvs(loc=0, scale=1, size=10)
print(sample)
# [ 0.65335105 -1.47934229  0.32367792  1.00588696  0.9904004  -0.83857692
#  -0.16714073  0.89918586 -1.94588769 -0.31817522]

(Continuous) Uniform distribution

The uniform distribution describes a scenario where all values within a given interval are equally likely. It is the simplest distribution representing complete uncertainty within a range.

Type:Continuous
Parameters:\( a \) (minimum), \( b \) (maximum)
PDF:\( f(x) = \frac{1}{b-a}, \quad a \leq x \leq b \)
E(X):\( \frac{1}{2}(a + b) \)
Var(X):\( \frac{1}{12}(b - a)^2 \)

Probability Density Function of a Uniform Distribution.

Draw a random sample:

from scipy.stats import uniform

sample = uniform.rvs(loc=0, scale=2, size=10)
print(sample)
# [1.8145654  1.46990848 1.91692528 1.43172368 1.87616664 1.15829366
#  1.91751693 1.73449682 1.53206957 0.40146235]

Bernoulli distribution

Models a single trial with two possible outcomes: success (1) and failure (0).

Type:Discrete
Parameters:\( p \) (probability of success)
PMF:\( P(X = x) = \begin{cases} p, & \text{if } x = 1 \\ 1 - p, & \text{if } x = 0 \end{cases} \)
E(X):\( p \)
Var(X):\( p(1-p) \)

Probability Mass Function of a Bernoulli Distribution.

Draw a random sample:

from scipy.stats import bernoulli

sample = bernoulli.rvs(p=0.7, size=10)
print(sample)
# [1 0 1 1 1 0 1 1 1 1]

Binomial distribution

Models the number of successes in a fixed number of independent trials with the same success probability.

Type:Discrete
Parameters:\( n \) (number of trials), \( p \) (probability of success)
PMF:\( P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k = 0, 1, 2, \dots, n \)
E(X):\( np \)
Var(X):\( np(1-p) \)

Probability Mass Function of a Binomial Distribution.

Draw a random sample:

from scipy.stats import binom

sample = binom.rvs(n=10, p=0.5, size=10)
print(sample)
# [5 6 5 5 8 6 3 3 4 4]

Poisson distribution

Models the number of events occurring in a fixed interval, assuming events happen independently and at a constant average rate. Commonly used to model the number of claims occurring within a fixed time period.

Type:Discrete
Parameters:\( \lambda \) (rate parameter)
PMF:\( P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \dots \)
E(X):\( \lambda \)
Var(X):\( \lambda \)

Probability Mass Function of a Poisson Distribution.

Draw a random sample:

from scipy.stats import poisson

sample = poisson.rvs(mu=4, size=10)
print(sample)
# [1 3 4 2 4 2 2 4 5 7]

Read also:

Log in to add your comment.