Understanding statistical distributions is key when working with data, as they help describe how values behave in a dataset. Different distributions are useful for modelling different types of real-world scenarios. This guide provides a brief overview of some of the most commonly used statistical distributions, along with their key properties and how to generate random samples from them using Python.
Contents:
Normal distribution
The normal distribution, also called the Gaussian distribution, is symmetric and bell-shaped, commonly used to model natural phenomena where values cluster around a central mean with a given spread.
Type: | Continuous |
Parameters: | \( \mu \) (mean), \( \sigma^2 \) (variance) |
PDF: | \( f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \) |
E(X): | \( \mu \) |
Var(X): | \( \sigma^2 \) |
Draw a random sample:
from scipy.stats import norm
sample = norm.rvs(loc=0, scale=1, size=10)
print(sample)
# [ 0.65335105 -1.47934229 0.32367792 1.00588696 0.9904004 -0.83857692
# -0.16714073 0.89918586 -1.94588769 -0.31817522]
(Continuous) Uniform distribution
The uniform distribution describes a scenario where all values within a given interval are equally likely. It is the simplest distribution representing complete uncertainty within a range.
Type: | Continuous |
Parameters: | \( a \) (minimum), \( b \) (maximum) |
PDF: | \( f(x) = \frac{1}{b-a}, \quad a \leq x \leq b \) |
E(X): | \( \frac{1}{2}(a + b) \) |
Var(X): | \( \frac{1}{12}(b - a)^2 \) |
Draw a random sample:
from scipy.stats import uniform
sample = uniform.rvs(loc=0, scale=2, size=10)
print(sample)
# [1.8145654 1.46990848 1.91692528 1.43172368 1.87616664 1.15829366
# 1.91751693 1.73449682 1.53206957 0.40146235]
Bernoulli distribution
Models a single trial with two possible outcomes: success (1) and failure (0).
Type: | Discrete |
Parameters: | \( p \) (probability of success) |
PMF: | \( P(X = x) = \begin{cases} p, & \text{if } x = 1 \\ 1 - p, & \text{if } x = 0 \end{cases} \) |
E(X): | \( p \) |
Var(X): | \( p(1-p) \) |
Draw a random sample:
from scipy.stats import bernoulli
sample = bernoulli.rvs(p=0.7, size=10)
print(sample)
# [1 0 1 1 1 0 1 1 1 1]
Binomial distribution
Models the number of successes in a fixed number of independent trials with the same success probability.
Type: | Discrete |
Parameters: | \( n \) (number of trials), \( p \) (probability of success) |
PMF: | \( P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k = 0, 1, 2, \dots, n \) |
E(X): | \( np \) |
Var(X): | \( np(1-p) \) |
Draw a random sample:
from scipy.stats import binom
sample = binom.rvs(n=10, p=0.5, size=10)
print(sample)
# [5 6 5 5 8 6 3 3 4 4]
Poisson distribution
Models the number of events occurring in a fixed interval, assuming events happen independently and at a constant average rate. Commonly used to model the number of claims occurring within a fixed time period.
Type: | Discrete |
Parameters: | \( \lambda \) (rate parameter) |
PMF: | \( P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \dots \) |
E(X): | \( \lambda \) |
Var(X): | \( \lambda \) |
Draw a random sample:
from scipy.stats import poisson
sample = poisson.rvs(mu=4, size=10)
print(sample)
# [1 3 4 2 4 2 2 4 5 7]