5.2. Model Registries
Model registries are centralized repositories that allow teams to store, version, and manage machine learning models throughout their lifecycle. Using a model registry ensures that models are traceable, auditable, and reproducible, which is critical in insurance pricing where models directly impact financial decisions.
Why Use a Model Registry?
Model registries provide several benefits:
- Version Control – Every model version is tracked, so you can reproduce results from any point in time.
- Lifecycle Management – Manage stages like
Staging
,Production
,Archived
to control which models are actively used. - Auditability & Governance – Maintain a full history of who created or approved a model and when.
- Collaboration – Teams can share models easily without confusion over which version is current.
- Integration – Models can be pulled directly into scoring pipelines or deployed APIs reliably.
In pricing teams, this prevents “key man” risk, ensures regulatory compliance, and allows smooth transitions between analysts or teams.
Registering a Model in MLflow
- Train the Model – Run your training pipeline as usual (GBM, GLM, etc.).
- Log the Model – Use
mlflow.sklearn.log_model()
(or the appropriate MLflow flavor) to save the model and its artifacts. - Register the Model – Push it into the MLflow Model Registry, assigning a name and optional description.
Example:
import mlflow
import mlflow.sklearn
with mlflow.start_run():
model.fit(X_train, y_train)
mlflow.sklearn.log_model(model, "insurance_model", registered_model_name="PricingModel")
This creates a versioned entry in the registry with metadata and artifacts attached.
Signatures & Input Schema
A model signature defines the expected inputs and outputs for a model.
Benefits:
- Ensures that the model receives data in the correct format during production scoring.
- Detects schema mismatches early, preventing runtime errors.
Example in MLflow:
from mlflow.models.signature import infer_signature
signature = infer_signature(X_train, model.predict(X_train))
mlflow.sklearn.log_model(model, "insurance_model", signature=signature)
Stages and Aliases
Models in the registry can be assigned stages:
- Staging – for internal testing, validation, and user acceptance.
- Production – the active model used in pricing decisions.
- Archived – previous versions kept for historical reference.
Aliases (like 'Current', 'Latest') allow pipelines to always pull the correct version without hardcoding model versions, making deployment smoother.
Key Takeaways for Pricing Teams
- Treat models like code artifacts with versioning and governance.
- Always use signatures to enforce input/output contracts.
- Use stages and aliases to safely manage production updates.
- Combine with experiment tracking to link model versions to specific training runs, datasets, and configurations.