In this post, we will build a model for an insurance product with riders. The model will be built in Python using the cashflower package. An interesting part of this project will be that we will use two model point sets.
List of content:
Data format
In this post, we will model an insurance product where a policyholder can have multiple coverages. For example, the policyholder can be insured against death and illness within the same policy.
In terms of model point data, the easiest approach would be to have additional columns in the main model point set. These columns would contain information on the additional coverages.
However, with many coverages, the model point file might grow out of proportion. Also, some policyholders can have one coverage and others multiple. If the policyholder is insured only against death then the remaining columns would remain empty.
The model point set could look like this.
print(policy.data)
id age sex premium sum_assured_death sum_assured_illness
1 32 MALE 140.0 100 000.00 30 000.00
2 48 FEMALE 150.0 120 000.00 NaN
A more efficient solution is to store coverage data in the long format in a separate file. If the policyholder has 2 coverages, then there will be 1 record in the main model point set and 2 in the additional one. If the policyholder has only 1 coverage, there will be 1 record in both of the files.
print(policy.data)
id age sex premium
1 32 MALE 140.0
2 48 FEMALE 150.0
print(coverage.data)
id sum_assured type
1 100 000.00 DEATH
1 30 000.00 ILLNESS
2 120 000.00 DEATH
The first policyholder has two coverages (against death and illness) and the second policyholder has one coverage (only against death).
Preparation
Install the cashflower package using terminal, if you don't have it yet.
pip install cashflower
Create a new model named riders using Python.
from cashflower import create_model
create_model("riders")
The new folder riders has been created with the initial files structure.
Let's prepare model point sets and assumptions for the model. For this model, we will use two model point sets: one for main policy data and the second for coverages.
# input.py
import pandas as pd
from cashflower import ModelPointSet
policy = ModelPointSet(data=pd.DataFrame({
"id": [1, 2],
"age": [32, 42],
"sex": ["MALE", "FEMALE"],
"premium": [140, 150]
}), id_column="id")
coverage = ModelPointSet(data=pd.DataFrame({
"id": [1, 1, 2],
"sum_assured": [100_000, 30_000, 120_000],
"type": ["DEATH", "ILLNESS", "DEATH"],
}), id_column="id", main=False)
The first policyholder is a 32-year-old male. The insurance company will pay a benefit of €100 000 in case of death and €30 000 in case of illness. The second policyholder is a 42-year-old female. She has only coverage for the event of death with a sum assured of €120 000. They pay €140 and €150 premiums respectively.
We will use 3 assumptions for the model: mortality rate, morbidity rate and interest rate. For simplicity, we will assume that each of these assumptions is a single value. In production models, these assumptions will form a table. For mortality and morbidity, the rates depend on age and sex. For the interest rate, the value depends on the term.
# input.py
assumption = {
"DEATH_PROB": 0.003,
"MORB_PROB": 0.008,
"INTEREST_RATE": 0.005
}
We have all the input data that we need. Let's start modelling!
Model
Let's start with modelling survival rate and expected premium.
# model.py
from cashflower import variable
from input import assumption, policy, coverage
from settings import settings
@variable()
def survival_rate(t):
if t == 0:
return 1 - assumption["DEATH_PROB"]
return survival_rate(t-1) * (1 - assumption["DEATH_PROB"])
@variable()
def expected_premium(t):
if t == 0:
return 0
return policy.get("premium") * survival_rate(t-1)
Now we will calculate the expected benefit.
We have multiple records in the coverage file, so we will loop over the data.
# model.py
@variable()
def expected_benefit(t):
if t == 0:
return 0
benefit = 0
for i in range(coverage.model_point_data.shape[0]):
sum_assured = coverage.get("sum_assured", i)
_type = coverage.get("type", i)
if _type == "DEATH":
benefit += survival_rate(t - 1) * assumption["DEATH_PROB"] * sum_assured
elif _type == "ILLNESS":
benefit += survival_rate(t - 1) * assumption["MORB_PROB"] * sum_assured
else:
raise ValueError(f"Unknown coverage type {_type}.")
return benefit
We use coverage.model_point_data.shape[0] to get the number of records for the given model point.
We can now calculate the present value of expected cash flows and the best estimate of liabilities.
# model.py
@variable()
def pv_expected_premium(t):
if t == settings["T_MAX_CALCULATION"]:
return expected_premium(t)
return expected_premium(t) + pv_expected_premium(t+1) * 1/(1+assumption["INTEREST_RATE"])
@variable()
def pv_expected_benefit(t):
if t == settings["T_MAX_CALCULATION"]:
return expected_benefit(t)
return expected_benefit(t) + pv_expected_benefit(t+1) * 1/(1+assumption["INTEREST_RATE"])
@variable()
def best_estimate_liabilities(t):
return pv_expected_benefit(t) - pv_expected_premium(t)
Results
We will produce the individual output, to see clearly how the model evaluated variables for each of the model points.
# settings.py
settings = {
# ...
"GROUP_BY": "id",
# ...
}
To run the model, source the run.py script.
# output
t id survival_rate expected_benefit expected_premium pv_expected_benefit pv_expected_premium best_estimate_liabilities
0 1 1.00 0.00 0.00 67084.22 17392.21 49692.02
1 1 0.99 538.38 139.58 67419.64 17479.17 49940.48
2 1 0.99 536.76 139.16 67215.67 17426.28 49789.38
3 1 0.99 535.15 138.74 67012.30 17373.56 49638.74
4 1 0.99 533.55 138.33 66809.53 17320.99 49488.54
5 1 0.98 531.95 137.91 66607.36 17268.57 49338.79
6 1 0.98 530.35 137.50 66405.79 17216.32 49189.47
...
714 1 0.12 63.20 16.39 431.99 112.00 320.00
715 1 0.12 63.01 16.34 370.63 96.09 274.54
716 1 0.12 62.82 16.29 309.16 80.15 229.01
717 1 0.12 62.64 16.24 247.57 64.18 183.38
718 1 0.12 62.45 16.19 185.86 48.18 137.67
719 1 0.11 62.26 16.14 124.03 32.15 91.87
720 1 0.11 62.07 16.09 62.07 16.09 45.98
0 2 1.00 0.00 0.00 44722.81 18634.51 26088.31
1 2 0.99 358.92 149.55 44946.43 18727.68 26218.75
2 2 0.99 357.84 149.10 44810.45 18671.02 26139.43
3 2 0.99 356.77 148.65 44674.87 18614.53 26060.34
4 2 0.99 355.70 148.21 44539.69 18558.20 25981.48
5 2 0.98 354.63 147.76 44404.91 18502.04 25902.86
6 2 0.98 353.57 147.32 44270.53 18446.05 25824.47
...
714 2 0.12 42.14 17.56 288.00 120.00 168.00
715 2 0.12 42.01 17.50 247.09 102.95 144.14
716 2 0.12 41.88 17.45 206.11 85.88 120.23
717 2 0.12 41.76 17.40 165.04 68.77 96.28
718 2 0.12 41.63 17.35 123.90 51.63 72.28
719 2 0.11 41.51 17.29 82.68 34.45 48.23
720 2 0.11 41.38 17.24 41.38 17.24 24.14
We have dealt with riders by creating a model point set with a long format.
Thank you for reading! If you have any questions, feel free to leave a comment below or on the github repository.