Insurance with riders (multiple model point sets)

Before we dive into building an insurance model with riders, let's briefly outline what this post will cover. Insurance products often come with optional coverages — also known as riders — that allow policyholders to extend their protection. Modelling such products requires a structured approach to handling multiple coverages efficiently.

In this post, we will build a Python model using the cashflower package. A key aspect of this model is that we will use two model point sets: one for policyholder data and another for coverage details. This approach will help us handle multiple coverages without cluttering the main dataset.

We will cover everything from preparing the data to implementing actuarial calculations, such as survival rates, expected premiums, and best estimate liabilities. By the end of this post, you will have a functional model that can be extended to real-world insurance applications.

Content:

Data format
Preparation
Model
Results

Data format

In this post, we will model an insurance product where a policyholder can have multiple coverages. For example, the policyholder can be insured against death and illness within the same policy.

In terms of model point data, the easiest approach would be to have additional columns in the main model point set. These columns would contain information on the additional coverages.

However, with many coverages, the model point file might grow out of proportion. Also, some policyholders can have one coverage and others multiple. If the policyholder is insured only against death then the remaining columns would remain empty.

The model point set could look like this.

print(policy.data)

id  age    sex  premium sum_assured_death sum_assured_illness
 1   32   MALE    140.0        100 000.00           30 000.00
 2   48 FEMALE    150.0        120 000.00                 NaN

A more efficient solution is to store coverage data in the long format in a separate file. If the policyholder has 2 coverages, then there will be 1 record in the main model point set and 2 in the additional one. If the policyholder has only 1 coverage, there will be 1 record in both of the files.

print(policy.data)

id  age    sex  premium
 1   32   MALE    140.0
 2   48 FEMALE    150.0

print(coverage.data)

id  sum_assured     type
 1   100 000.00    DEATH
 1    30 000.00  ILLNESS
 2   120 000.00    DEATH

The first policyholder has two coverages (against death and illness) and the second policyholder has one coverage (only against death).

Preparation

Install the cashflower package using terminal, if you don't have it yet.

pip install cashflower

Create a new model named riders using Python.

from cashflower import create_model

create_model("riders")

The new folder riders has been created with the initial files structure.

Let's prepare model point sets and assumptions for the model. For this model, we will use two model point sets: one for main policy data and the second for coverages.

# input.py
import pandas as pd
from cashflower import ModelPointSet

policy = ModelPointSet(data=pd.DataFrame({
    "id": [1, 2],
    "age": [32, 42],
    "sex": ["MALE", "FEMALE"],
    "premium": [140, 150]
}), id_column="id")

coverage = ModelPointSet(data=pd.DataFrame({
    "id": [1, 1, 2],
    "sum_assured": [100_000, 30_000, 120_000],
    "type": ["DEATH", "ILLNESS", "DEATH"],
}), id_column="id", main=False)

The first policyholder is a 32-year-old male. The insurance company will pay a benefit of €100 000 in case of death and €30 000 in case of illness. The second policyholder is a 42-year-old female. She has only coverage for the event of death with a sum assured of €120 000. They pay €140 and €150 premiums respectively.

We will use 3 assumptions for the model: mortality rate, morbidity rate and interest rate. For simplicity, we will assume that each of these assumptions is a single value. In production models, these assumptions will form a table. For mortality and morbidity, the rates depend on age and sex. For the interest rate, the value depends on the term.

# input.py
assumption = {
    "DEATH_PROB": 0.003,
    "MORB_PROB": 0.008,
    "INTEREST_RATE": 0.005
}

We have all the input data that we need. Let's start modelling!

Model

Let's start with modelling survival rate and expected premium.

# model.py
from cashflower import variable

from input import assumption, policy, coverage
from settings import settings


@variable()
def survival_rate(t):
    if t == 0:
        return 1 - assumption["DEATH_PROB"]
    return survival_rate(t-1) * (1 - assumption["DEATH_PROB"])


@variable()
def expected_premium(t):
    if t == 0:
        return 0
    return policy.get("premium") * survival_rate(t-1)

Now we will calculate the expected benefit.

We have multiple records in the coverage file, so we will loop over the data.

# model.py
@variable()
def expected_benefit(t):
    if t == 0:
        return 0

    benefit = 0
    for i in range(coverage.model_point_data.shape[0]):
        sum_assured = coverage.get("sum_assured", i)
        _type = coverage.get("type", i)

        if _type == "DEATH":
            benefit += survival_rate(t - 1) * assumption["DEATH_PROB"] * sum_assured
        elif _type == "ILLNESS":
            benefit += survival_rate(t - 1) * assumption["MORB_PROB"] * sum_assured
        else:
            raise ValueError(f"Unknown coverage type {_type}.")

    return benefit

We use coverage.model_point_data.shape[0] to get the number of records for the given model point.

We can now calculate the present value of expected cash flows and the best estimate of liabilities.

# model.py
@variable()
def pv_expected_premium(t):
    if t == settings["T_MAX_CALCULATION"]:
        return expected_premium(t)
    return expected_premium(t) + pv_expected_premium(t+1) * 1/(1+assumption["INTEREST_RATE"])


@variable()
def pv_expected_benefit(t):
    if t == settings["T_MAX_CALCULATION"]:
        return expected_benefit(t)
    return expected_benefit(t) + pv_expected_benefit(t+1) * 1/(1+assumption["INTEREST_RATE"])


@variable()
def best_estimate_liabilities(t):
    return pv_expected_benefit(t) - pv_expected_premium(t)

Results

We will produce the individual output, to see clearly how the model evaluated variables for each of the model points.

# settings.py
settings = {
    # ...
    "GROUP_BY": "id",
    # ...
}

To run the model, source the run.py script.

# output  
  t  id  survival_rate  expected_benefit  expected_premium  pv_expected_benefit  pv_expected_premium  best_estimate_liabilities
  0   1           1.00              0.00              0.00             67084.22             17392.21                   49692.02
  1   1           0.99            538.38            139.58             67419.64             17479.17                   49940.48
  2   1           0.99            536.76            139.16             67215.67             17426.28                   49789.38
  3   1           0.99            535.15            138.74             67012.30             17373.56                   49638.74
  4   1           0.99            533.55            138.33             66809.53             17320.99                   49488.54
  5   1           0.98            531.95            137.91             66607.36             17268.57                   49338.79
  6   1           0.98            530.35            137.50             66405.79             17216.32                   49189.47
...
714   1           0.12             63.20             16.39               431.99               112.00                     320.00
715   1           0.12             63.01             16.34               370.63                96.09                     274.54
716   1           0.12             62.82             16.29               309.16                80.15                     229.01
717   1           0.12             62.64             16.24               247.57                64.18                     183.38
718   1           0.12             62.45             16.19               185.86                48.18                     137.67
719   1           0.11             62.26             16.14               124.03                32.15                      91.87
720   1           0.11             62.07             16.09                62.07                16.09                      45.98
  0   2           1.00              0.00              0.00             44722.81             18634.51                   26088.31
  1   2           0.99            358.92            149.55             44946.43             18727.68                   26218.75
  2   2           0.99            357.84            149.10             44810.45             18671.02                   26139.43
  3   2           0.99            356.77            148.65             44674.87             18614.53                   26060.34
  4   2           0.99            355.70            148.21             44539.69             18558.20                   25981.48
  5   2           0.98            354.63            147.76             44404.91             18502.04                   25902.86
  6   2           0.98            353.57            147.32             44270.53             18446.05                   25824.47
...
714   2           0.12             42.14             17.56               288.00               120.00                     168.00
715   2           0.12             42.01             17.50               247.09               102.95                     144.14
716   2           0.12             41.88             17.45               206.11                85.88                     120.23
717   2           0.12             41.76             17.40               165.04                68.77                      96.28
718   2           0.12             41.63             17.35               123.90                51.63                      72.28
719   2           0.11             41.51             17.29                82.68                34.45                      48.23
720   2           0.11             41.38             17.24                41.38                17.24                      24.14

We have dealt with riders by creating a model point set with a long format.

Thank you for reading! If you have any questions, feel free to leave a comment below or on the github repository.

Insurance with riders (multiple model point sets)

Data format

Preparation

Model

Results

Read also:

Comments