Skip to content

5.2. Training & Validation

Training and validation pipelines are the backbone of model development.

The goal is to make training repeatable, configurable, and transparent, so that multiple analysts can run models consistently without rewriting code each time.


Abstract Training Pipelines & Config Files

Most training and validation logic can be written once and reused.
Instead of hardcoding settings into scripts, store them in configuration files (e.g. YAML, JSON).

Configuration might include:
- Target (response variable)
- Exposure/offset (if relevant, e.g. earned premium)
- Weighting (e.g. per-policy weights)
- Feature list
- Loss function
- Hyperparameters

This approach ensures:
- Consistency across models
- Flexibility to test variations quickly
- Clear documentation of what was run

{
    "target": "ClaimCount",
    "exposure": "Exposure",
    "features": [
        "VehPower",
        "VehAge",
        "DrivAge",
        "BonusMalus",
        "VehBrand",
        "VehGas",
        "Area",
        "Density",
        "Region"
    ],
    "split": {
        "field": "Group",
        "assignment": {
            "1": "Train",
            "2": "Train",
            "3": "Train",
            "4": "Test",
            "5": "Holdout"
        }
    },
    "gbm_params": {
        "learning_params": {
            "loss_function": "Poisson",
            "learning_rate": 0.1,
            "depth": 3,
            "l2_leaf_reg": 2,
            "random_strength": 2,
            "bagging_temperature": 1,
            "verbose": 0
        },
        "num_rounds": 10000,
        "early_stopping_rounds": 10
    }
}

Which can be read in as so:

with open('./config/frequency_config.json', 'r') as f:
    config = json.load(f)

Gradient Boosting Machines (GBMs)

GBMs are generally the best performing model for tabular data, are very easy to setup, and plenty of other tools interact well with the most popular GBM libraries. These include:

  • XGBoost – Originally very popular, requires numeric input, and categorical features to be one-hot-encoded.
  • LightGBM – Requires numeric input, categorical features can be string-indexed.
  • CatBoost – Strong performance without tuning, handles string categories natively.

Given CatBoost doesn't require preprocessing of categorical features, and has strong out-of-box performance without hyperparameter tuning, the modelling pipelines are simpler, which often means it's the preferred option.

params = config.get('gbm_params').get('learning_params')
num_round = config.get('gbm_params').get('num_rounds')
early_stopping_rounds = config.get('gbm_params').get('early_stopping_rounds')

FrequencyModel = CatBoostRegressor(**params)
FrequencyModel.fit(train_pool, eval_set=[test_pool], early_stopping_rounds=early_stopping_rounds)

Plots & Validation

Validation is not just about metrics - it’s about understanding the model's behaviour.

Unlike GLM's where each feature is inspected and fitted manually, GBMs fit each feature automatically, and so reviewing the fit of each feature is important.

Modules should be created in your repository for creating these plots automatically for every model run.

Common validation plots:
- Feature Importance / SHAP – explainability, showing which factors matter most.
- Partial Dependence / SHAP dependence plots – how factors influence predictions.
- Calibration plots – compare predicted vs. actual outcomes (critical for pricing).
- Residual plots – highlight systematic biases.
- Lift/Gain charts – for classification tasks like conversion.

These can be generated as a PDF report that can be reviewed before pushing a model to production.


Metrics

Different types of models require different evaluation metrics.

Problem Type Common Metrics Insurance Relevance
Binary Classification (e.g. conversion, cancellation) ROC-AUC, Log-loss, Lift/Gain, Precision/Recall Assess ability to rank risks and select profitable business
Multiclass Classification (e.g. product selection, competitor choice) Accuracy, Cross-Entropy, Macro/Micro AUC Evaluate across multiple outcomes
Regression (Continuous Response) RMSE, MAE, R² Predicting claim costs, severity
Poisson Regression (e.g. claim counts) Deviance, AIC/BIC, Gini for ranking Standard in frequency modelling
Gamma Regression (e.g. claim severity) Deviance, RMSE, Calibration plots Standard in severity modelling
Combined GLM/GBM Models Gini, Lift, Calibration, Profit Curves Align with pricing/business KPIs