Statistical distributions

Understanding statistical distributions is key when working with data, as they help describe how values behave in a dataset. Different distributions are useful for modelling different types of real-world scenarios. This guide provides a brief overview of some of the most commonly used statistical distributions, along with their key properties and how to generate random samples from them using Python.


Contents:

  1. Normal
  2. t-distribution
  3. Uniform
  4. Bernoulli
  5. Gamma
  6. Binomial
  7. Poisson
  8. Chi-squared
  9. Exponential
  10. Log-normal

Normal distribution

The normal distribution, also called the Gaussian distribution, is symmetric and bell-shaped, commonly used to model natural phenomena where values cluster around a central mean with a given spread.

Type:Continuous
Parameters:\( \mu \) (mean), \( \sigma^2 \) (variance)
PDF:\( f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \)
E(X):\( \mu \)
Var(X):\( \sigma^2 \)

Probability Density Function of a Normal Distribution.

Draw a random sample:

from scipy.stats import norm

sample = norm.rvs(loc=0, scale=1, size=10)
print(sample)
# [ 0.65335105 -1.47934229  0.32367792  1.00588696  0.9904004  -0.83857692
#  -0.16714073  0.89918586 -1.94588769 -0.31817522]

t-distribution

The t-distribution is similar to the normal distribution but has heavier tails, meaning it's more likely to produce extreme values. It's often used when the sample size is small, and the population variance is unknown, making it a key tool in statistical inference. The shape of the t-distribution changes depending on the number of degrees of freedom \( \nu \); as \( \nu \) increases, it approaches the normal distribution.

Type:Continuous
Parameters:\( \nu \) (degrees of freedom)
PDF:\( \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu \pi} \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}} \)
E(X): \( \begin{cases} 0 & \text{if } \nu > 1 \\ \text{undefined} & \text{if } \nu \leq 1 \end{cases} \)
Var(X): \( \begin{cases} \frac{\nu}{\nu - 2} & \text{if } \nu > 2 \\ \infty & \text{if } 1 < \nu \leq 2 \\ \text{undefined} & \text{if } \nu \leq 1 \end{cases} \)

Probability Density Function of a t-distribution.

Draw a random sample:

from scipy.stats import t

sample = t.rvs(df=10, size=10)
print(sample)

# [ 1.56994299  0.18342034 -0.80179643  0.08913814  1.30832327 -2.24107071
#   0.78455719 -1.77650584 -0.15166975 -0.10756829]

(Continuous) Uniform distribution

The uniform distribution describes a scenario where all values within a given interval are equally likely. It is the simplest distribution representing complete uncertainty within a range.

Type:Continuous
Parameters:\( a \) (minimum), \( b \) (maximum)
PDF:\( f(x) = \frac{1}{b-a}, \quad a \leq x \leq b \)
E(X):\( \frac{1}{2}(a + b) \)
Var(X):\( \frac{1}{12}(b - a)^2 \)

Probability Density Function of a Uniform Distribution.

Draw a random sample:

from scipy.stats import uniform

sample = uniform.rvs(loc=0, scale=2, size=10) 
print(sample)
# [1.8145654  1.46990848 1.91692528 1.43172368 1.87616664 1.15829366
#  1.91751693 1.73449682 1.53206957 0.40146235]

Bernoulli distribution

Models a single trial with two possible outcomes: success (1) and failure (0).

Type:Discrete
Parameters:\( p \) (probability of success)
PMF:\( P(X = x) = \begin{cases} p & \text{if } x = 1 \\ 1 - p & \text{if } x = 0 \end{cases} \)
E(X):\( p \)
Var(X):\( p(1-p) \)

Probability Mass Function of a Bernoulli Distribution.

Draw a random sample:

from scipy.stats import bernoulli

sample = bernoulli.rvs(p=0.7, size=10)
print(sample)
# [1 0 1 1 1 0 1 1 1 1]

Gamma distribution

The gamma distribution is a versatile distribution that is commonly used to model the time until a fixed number of independent events occur, like the waiting time until a machine fails. It has two parameters: \( \alpha \), which controls the number of events or the shape of the distribution, and \( \beta \), which controls the rate at which these events occur or the scale of the distribution. As \( \alpha \) increases, the gamma distribution starts to resemble a normal distribution more closely.

Type:Continuous
Parameters:\( \alpha \) (shape), \( \beta \) (scale)
PDF:\( \frac{x^{\alpha-1} e^{-\frac{x}{\beta}}}{\beta^\alpha \Gamma(\alpha)} \)
E(X):\( \alpha \beta \)
Var(X):\( \alpha \beta^2 \)

Probability Density Function of a Gamma Distribution.

Draw a random sample:

from scipy.stats import gamma

sample = gamma.rvs(a=2, scale=2, size=10)
print(sample)

# [10.74173324  2.26014167  4.76710189  4.02444952  8.27867765  6.28606449
#   2.06291586  5.51807963  5.49954115  7.01190656]

Binomial distribution

Models the number of successes in a fixed number of independent trials with the same success probability.

Type:Discrete
Parameters:\( n \) (number of trials), \( p \) (probability of success)
PMF:\( P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k = 0, 1, 2, \dots, n \)
E(X):\( np \)
Var(X):\( np(1-p) \)

Probability Mass Function of a Binomial Distribution.

Draw a random sample:

from scipy.stats import binom

sample = binom.rvs(n=10, p=0.5, size=10)
print(sample)
# [5 6 5 5 8 6 3 3 4 4]

Poisson distribution

Models the number of events occurring in a fixed interval, assuming events happen independently and at a constant average rate. Commonly used to model the number of claims occurring within a fixed time period.

Type:Discrete
Parameters:\( \lambda \) (rate parameter)
PMF:\( P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \dots \)
E(X):\( \lambda \)
Var(X):\( \lambda \)

Probability Mass Function of a Poisson Distribution.

Draw a random sample:

from scipy.stats import poisson

sample = poisson.rvs(mu=4, size=10)
print(sample)
# [1 3 4 2 4 2 2 4 5 7]

Chi-squared distribution

The Chi-squared distribution is a type of probability distribution that describes the sum of the squares of a certain number of independent standard normal variables. It is often used in statistics to test hypotheses, such as determining if there is a significant difference between observed and expected values in categorical data. The shape of the distribution changes based on the number of degrees of freedom, which is the number of independent variables being considered. As the degrees of freedom increase, the distribution becomes more symmetric and resembles a normal distribution.

Type:Continuous
Parameters:\( k \) (degrees of freedom)
PDF:\( \frac{x^{(k/2) - 1} e^{-x/2}}{2^{k/2} \Gamma(k/2)} \)
E(X):\( k \)
Var(X):\( 2k \)

Probability Density Function of a Chi-squared Distribution.

Draw a random sample:

from scipy.stats import chi2

sample = chi2.rvs(df=5, size=10)
print(sample)

# [2.54937623 3.03072252 6.79169992 2.5822831  2.33811496 3.7880903
#  1.30649021 2.73864993 6.66913851 8.59093504]

Exponential distribution

The exponential distribution is a continuous probability distribution that is often used to model the time between independent events occurring at a constant average rate. It is characterized by its rate parameter, usually denoted by lambda (\( \lambda \)), which is the average number of events per time unit. The probability density function of the exponential distribution decreases exponentially, meaning there's a higher probability of events occurring sooner rather than later. This distribution is memoryless, meaning the probability of an event occurring in the future is independent of any past events.

Type:Continuous
Parameters:\( \lambda \) (rate)
PDF: \( \begin{cases} \lambda e^{-\lambda x} & \text{if } x \geq 0, \\ 0 &\text{if } x < 0. \end{cases} \)
E(X):\( \frac{1}{\lambda} \)
Var(X):\( \frac{1}{\lambda^2} \)

Probability Density Function of an Exponential Distribution.

Draw a random sample:

from scipy.stats import expon

sample = expon.rvs(scale=1, size=10)
print(sample)

# [0.10187759 1.39959414 0.31290062 0.16490726 0.43281394 1.01861464
#  0.94142538 0.14548016 1.42020756 0.14661094]

Log-normal distribution

The log-normal distribution describes values that, when you take the logarithm, have a normal (bell-shaped) distribution. This means it is useful for modeling things that can't be negative, like income, stock prices, or lifetimes, where growth often happens by multiplication rather than addition.

Type:Continuous
Parameters:\( \mu \), \( \sigma^2 \)
PDF:\( \frac{1}{\sqrt{2\pi \sigma^2 x^2}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}} \)
E(X):\( e^{\mu + \frac{\sigma^2}{2}} \)
Var(X):\( e^{2\mu + \sigma^2} \left( e^{\sigma^2} - 1 \right) \)

Probability Density Function of a Log-normal Distribution.

Draw a random sample:

from scipy.stats import lognorm

sample = lognorm.rvs(s=0.5, scale=1, size=10)
print(sample)

# [1.39938684 1.12673983 1.04218664 1.0467863  0.63449255 1.02603089
#  0.97080305 0.95030233 0.43810272 0.90024099]

Read also:

Log in to add your comment.

Comments

admin (2025-01-20)

Thanks, lumora! We will add more distributions to this post!


lumora (2025-01-19)

It's a nice overview, but many distributions are still missing, such as t-distribution, gamma, and log-normal.