5.2. Training & Validation
Training and validation pipelines are the backbone of model development.
The goal is to make training repeatable, configurable, and transparent, so that multiple analysts can run models consistently without rewriting code each time.
Abstract Training Pipelines & Config Files
Most training and validation logic can be written once and reused.
Instead of hardcoding settings into scripts, store them in configuration files (e.g. YAML, JSON).
Configuration might include:
- Target (response variable)
- Exposure/offset (if relevant, e.g. earned premium)
- Weighting (e.g. per-policy weights)
- Feature list
- Loss function
- Hyperparameters
This approach ensures:
- Consistency across models
- Flexibility to test variations quickly
- Clear documentation of what was run
{
"target": "ClaimCount",
"exposure": "Exposure",
"features": [
"VehPower",
"VehAge",
"DrivAge",
"BonusMalus",
"VehBrand",
"VehGas",
"Area",
"Density",
"Region"
],
"split": {
"field": "Group",
"assignment": {
"1": "Train",
"2": "Train",
"3": "Train",
"4": "Test",
"5": "Holdout"
}
},
"gbm_params": {
"learning_params": {
"loss_function": "Poisson",
"learning_rate": 0.1,
"depth": 3,
"l2_leaf_reg": 2,
"random_strength": 2,
"bagging_temperature": 1,
"verbose": 0
},
"num_rounds": 10000,
"early_stopping_rounds": 10
}
}
Which can be read in as so:
with open('./config/frequency_config.json', 'r') as f:
config = json.load(f)
Gradient Boosting Machines (GBMs)
GBMs are generally the best performing model for tabular data, are very easy to setup, and plenty of other tools interact well with the most popular GBM libraries. These include:
- XGBoost – Originally very popular, requires numeric input, and categorical features to be one-hot-encoded.
- LightGBM – Requires numeric input, categorical features can be string-indexed.
- CatBoost – Strong performance without tuning, handles string categories natively.
Given CatBoost doesn't require preprocessing of categorical features, and has strong out-of-box performance without hyperparameter tuning, the modelling pipelines are simpler, which often means it's the preferred option.
params = config.get('gbm_params').get('learning_params')
num_round = config.get('gbm_params').get('num_rounds')
early_stopping_rounds = config.get('gbm_params').get('early_stopping_rounds')
FrequencyModel = CatBoostRegressor(**params)
FrequencyModel.fit(train_pool, eval_set=[test_pool], early_stopping_rounds=early_stopping_rounds)
Plots & Validation
Validation is not just about metrics - it’s about understanding the model's behaviour.
Unlike GLM's where each feature is inspected and fitted manually, GBMs fit each feature automatically, and so reviewing the fit of each feature is important.
Modules should be created in your repository for creating these plots automatically for every model run.
Common validation plots:
- Feature Importance / SHAP – explainability, showing which factors matter most.
- Partial Dependence / SHAP dependence plots – how factors influence predictions.
- Calibration plots – compare predicted vs. actual outcomes (critical for pricing).
- Residual plots – highlight systematic biases.
- Lift/Gain charts – for classification tasks like conversion.
These can be generated as a PDF report that can be reviewed before pushing a model to production.
Metrics
Different types of models require different evaluation metrics.
Problem Type | Common Metrics | Insurance Relevance |
---|---|---|
Binary Classification (e.g. conversion, cancellation) | ROC-AUC, Log-loss, Lift/Gain, Precision/Recall | Assess ability to rank risks and select profitable business |
Multiclass Classification (e.g. product selection, competitor choice) | Accuracy, Cross-Entropy, Macro/Micro AUC | Evaluate across multiple outcomes |
Regression (Continuous Response) | RMSE, MAE, R² | Predicting claim costs, severity |
Poisson Regression (e.g. claim counts) | Deviance, AIC/BIC, Gini for ranking | Standard in frequency modelling |
Gamma Regression (e.g. claim severity) | Deviance, RMSE, Calibration plots | Standard in severity modelling |
Combined GLM/GBM Models | Gini, Lift, Calibration, Profit Curves | Align with pricing/business KPIs |