5.2. Model Registries

Model registries are centralized repositories that allow teams to store, version, and manage machine learning models throughout their lifecycle. Using a model registry ensures that models are traceable, auditable, and reproducible, which is critical in insurance pricing where models directly impact financial decisions.

Why Use a Model Registry?

Model registries provide several benefits:

Version Control – Every model version is tracked, so you can reproduce results from any point in time.
Lifecycle Management – Manage stages like Staging, Production, Archived to control which models are actively used.
Auditability & Governance – Maintain a full history of who created or approved a model and when.
Collaboration – Teams can share models easily without confusion over which version is current.
Integration – Models can be pulled directly into scoring pipelines or deployed APIs reliably.

In pricing teams, this prevents “key man” risk, ensures regulatory compliance, and allows smooth transitions between analysts or teams.

Registering a Model in MLflow

Train the Model – Run your training pipeline as usual (GBM, GLM, etc.).
Log the Model – Use mlflow.sklearn.log_model() (or the appropriate MLflow flavor) to save the model and its artifacts.
Register the Model – Push it into the MLflow Model Registry, assigning a name and optional description.

Example:

import mlflow
import mlflow.sklearn

with mlflow.start_run():
    model.fit(X_train, y_train)
    mlflow.sklearn.log_model(model, "insurance_model", registered_model_name="PricingModel")

This creates a versioned entry in the registry with metadata and artifacts attached.

Signatures & Input Schema

A model signature defines the expected inputs and outputs for a model.

Benefits:

Ensures that the model receives data in the correct format during production scoring.
Detects schema mismatches early, preventing runtime errors.

Example in MLflow:

from mlflow.models.signature import infer_signature

signature = infer_signature(X_train, model.predict(X_train))
mlflow.sklearn.log_model(model, "insurance_model", signature=signature)

Stages and Aliases

Models in the registry can be assigned stages:

Staging – for internal testing, validation, and user acceptance.
Production – the active model used in pricing decisions.
Archived – previous versions kept for historical reference.

Aliases (like 'Current', 'Latest') allow pipelines to always pull the correct version without hardcoding model versions, making deployment smoother.

Key Takeaways for Pricing Teams

Treat models like code artifacts with versioning and governance.
Always use signatures to enforce input/output contracts.
Use stages and aliases to safely manage production updates.
Combine with experiment tracking to link model versions to specific training runs, datasets, and configurations.