5.1. Training & Validation

Training and validation pipelines are the backbone of model development.

The goal is to make training repeatable, configurable, and transparent, so that multiple analysts can run models consistently without rewriting code each time.

Abstract Training Pipelines & Config Files

Most training and validation logic can be written once and reused.
Instead of hardcoding settings into scripts, store them in configuration files (e.g. YAML, JSON).

Configuration might include:
- Target (response variable)
- Exposure/offset (if relevant, e.g. earned premium)
- Weighting (e.g. per-policy weights)
- Feature list
- Loss function
- Hyperparameters

This approach ensures:
- Consistency across models
- Flexibility to test variations quickly
- Clear documentation of what was run

Gradient Boosting Machines (GBMs)

GBMs are state-of-the-art for predictive modelling on tabular data (the dominant data structure in insurance).

Popular libraries:
- CatBoost – often outperforms others on categorical-heavy datasets, handles missing values and categories natively.
- LightGBM – fast, memory efficient, strong on large datasets.
- XGBoost – mature and well-documented, but requires more preprocessing.

In insurance pricing, GBMs are particularly strong at capturing nonlinear interactions (e.g. age × vehicle type), often outperforming traditional GLMs when raw predictive power is the main goal.

Plots & Validation

Validation is not just about metrics — it’s about understanding the model’s behaviour.

Common validation plots:
- Feature Importance / SHAP – explainability, showing which factors matter most.
- Partial Dependence / SHAP dependence plots – how factors influence predictions.
- Calibration plots – compare predicted vs. actual outcomes (critical for pricing).
- Residual plots – highlight systematic biases.
- Lift/Gain charts – for classification tasks like conversion.

These plots should ideally be generated dynamically for every model run, based on the config file feature list.

Metrics

Different types of models require different evaluation metrics.

Problem Type	Common Metrics	Insurance Relevance
Binary Classification (e.g. conversion, cancellation)	ROC-AUC, Log-loss, Lift/Gain, Precision/Recall	Assess ability to rank risks and select profitable business
Multiclass Classification (e.g. product selection, competitor choice)	Accuracy, Cross-Entropy, Macro/Micro AUC	Evaluate across multiple outcomes
Regression (Continuous Response)	RMSE, MAE, R²	Predicting claim costs, severity
Poisson Regression (e.g. claim counts)	Deviance, AIC/BIC, Gini for ranking	Standard in frequency modelling
Gamma Regression (e.g. claim severity)	Deviance, RMSE, Calibration plots	Standard in severity modelling
Combined GLM/GBM Models	Gini, Lift, Calibration, Profit Curves	Align with pricing/business KPIs

Experiment Tracking

Experiment tracking means logging every model run with its:
- Dataset version
- Feature list
- Parameters & hyperparameters
- Metrics
- Plots & diagnostics

Tools like MLflow, Weights & Biases, or even structured spreadsheets can be used.

This enables:
- Full reproducibility
- Easy comparison across models
- Governance/audit trail for model approval

This is a major gap in most pricing teams, where models are often trained ad-hoc and not easily traceable.

Model Metadata

Each model should carry metadata about:
- Which features were used
- Which dataset version was trained on
- Dependencies on other models (e.g. conversion model referencing competitor models)
- Date built and by whom
- Software/library versions

Why it matters:
- Ensures that when scoring, you only supply the required features
- Supports model chaining (e.g. profitability models using outputs from conversion models)
- Simplifies monitoring and retraining processes

Extensions for Pricing Teams

Champion/Challenger Testing – run multiple models in parallel to evaluate uplift.
Business KPIs – track metrics like combined ratio, loss ratio, or quote volume impact alongside statistical metrics.
Scenario Testing – simulate how the model performs under market shocks (e.g. premium increase, competitor withdrawal).