This post is a brief tutorial on using Amazon S3 (Simple Storage Service) to manage inputs and outputs in actuarial cash flow models. S3 is a cloud storage solution that makes it easy to store and retrieve data anytime and it's commonly used for large datasets, file sharing, and backups. By using S3, we can connect different processes and teams, centralising input files, model results, and assumptions to keep everything organised and accessible.
List of content:
.env
To connect to AWS, we need to provide our program with the necessary credentials. We can store this information either in the operating system's environment variables or in a .env file.
A typical .env file looks like this:
This file allows our application to securely access AWS without hardcoding sensitive information into our code.
Utility functions
We will define three functions to help us work with S3 buckets:
First, we load our AWS credentials from the .env file using the load_dotenv() function.
Then we define:
- get_s3_client() - connects to AWS S3,
- load_from_s3() - reads a file from the S3 bucket into a dataframe,
- save_to_s3() - saves a dataframe to the S3 bucket.
With these utilities ready, we can move on to handling the model input.
Input
We will load data from the S3 bucket for the model point set and three assumption tables:
First, we create an S3 client. Then we use the load_from_s3() function to load the model point set and assumption tables directly from the bucket.
Output
We will use a similar approach to save the model output to the S3 bucket:
Again, we start by creating an S3 client. Then we use the save_to_s3() function to upload the output dataframe to the bucket.
In this short tutorial, we showed how to use S3 buckets for the inputs and outputs of actuarial cash flow models. Storing data in S3 helps with flexibility, version control, and makes it easier to share data across teams.