5.1. Training & Validation
Training and validation pipelines are the backbone of model development.
The goal is to make training repeatable, configurable, and transparent, so that multiple analysts can run models consistently without rewriting code each time.
Abstract Training Pipelines & Config Files
Most training and validation logic can be written once and reused.
Instead of hardcoding settings into scripts, store them in configuration files (e.g. YAML, JSON).
Configuration might include:
- Target (response variable)
- Exposure/offset (if relevant, e.g. earned premium)
- Weighting (e.g. per-policy weights)
- Feature list
- Loss function
- Hyperparameters
This approach ensures:
- Consistency across models
- Flexibility to test variations quickly
- Clear documentation of what was run
Gradient Boosting Machines (GBMs)
GBMs are state-of-the-art for predictive modelling on tabular data (the dominant data structure in insurance).
Popular libraries:
- CatBoost – often outperforms others on categorical-heavy datasets, handles missing values and categories natively.
- LightGBM – fast, memory efficient, strong on large datasets.
- XGBoost – mature and well-documented, but requires more preprocessing.
In insurance pricing, GBMs are particularly strong at capturing nonlinear interactions (e.g. age × vehicle type), often outperforming traditional GLMs when raw predictive power is the main goal.
Plots & Validation
Validation is not just about metrics — it’s about understanding the model’s behaviour.
Common validation plots:
- Feature Importance / SHAP – explainability, showing which factors matter most.
- Partial Dependence / SHAP dependence plots – how factors influence predictions.
- Calibration plots – compare predicted vs. actual outcomes (critical for pricing).
- Residual plots – highlight systematic biases.
- Lift/Gain charts – for classification tasks like conversion.
These plots should ideally be generated dynamically for every model run, based on the config file feature list.
Metrics
Different types of models require different evaluation metrics.
Problem Type | Common Metrics | Insurance Relevance |
---|---|---|
Binary Classification (e.g. conversion, cancellation) | ROC-AUC, Log-loss, Lift/Gain, Precision/Recall | Assess ability to rank risks and select profitable business |
Multiclass Classification (e.g. product selection, competitor choice) | Accuracy, Cross-Entropy, Macro/Micro AUC | Evaluate across multiple outcomes |
Regression (Continuous Response) | RMSE, MAE, R² | Predicting claim costs, severity |
Poisson Regression (e.g. claim counts) | Deviance, AIC/BIC, Gini for ranking | Standard in frequency modelling |
Gamma Regression (e.g. claim severity) | Deviance, RMSE, Calibration plots | Standard in severity modelling |
Combined GLM/GBM Models | Gini, Lift, Calibration, Profit Curves | Align with pricing/business KPIs |
Experiment Tracking
Experiment tracking means logging every model run with its:
- Dataset version
- Feature list
- Parameters & hyperparameters
- Metrics
- Plots & diagnostics
Tools like MLflow, Weights & Biases, or even structured spreadsheets can be used.
This enables:
- Full reproducibility
- Easy comparison across models
- Governance/audit trail for model approval
This is a major gap in most pricing teams, where models are often trained ad-hoc and not easily traceable.
Model Metadata
Each model should carry metadata about:
- Which features were used
- Which dataset version was trained on
- Dependencies on other models (e.g. conversion model referencing competitor models)
- Date built and by whom
- Software/library versions
Why it matters:
- Ensures that when scoring, you only supply the required features
- Supports model chaining (e.g. profitability models using outputs from conversion models)
- Simplifies monitoring and retraining processes
Extensions for Pricing Teams
- Champion/Challenger Testing – run multiple models in parallel to evaluate uplift.
- Business KPIs – track metrics like combined ratio, loss ratio, or quote volume impact alongside statistical metrics.
- Scenario Testing – simulate how the model performs under market shocks (e.g. premium increase, competitor withdrawal).