Skip to content

4.1. Tables

What is Great Tables?

great_tables is a Python library designed to make beautiful, well-formatted tables for your analysis outputs.
It’s especially useful when you want to share results with stakeholders - whether that’s a report, a presentation, or an email - without sending them raw Excel files or unformatted console prints.

Think of it as a way to present your data like a polished dashboard table, but entirely in Python.


Why use it in pricing workflows?

In insurance pricing, results are often shared in tables - for example:

  • Premium summaries by segment
  • Loss ratio breakdowns by product or region
  • Model performance comparison tables

In Excel, these tables can be manually formatted, but that’s time-consuming and error-prone.
With great_tables, you can automatically generate attractive, consistent tables straight from your Python code, keeping the formatting reproducible and avoiding manual edits.


Data preparation

Before creating a table, we want to manipulate our data to the shape we need.

The below example looks at frequency predictions compared against actual frequency for various features.

To do this, we need to aggregate the data to the feature level, calculate frequency and predicted frequency. And to aggregate at a feature level, any continous features need banding.

The below function is for banding, which is called within the next function which is aggregating the data.

def band_continous_features(
        df, 
        feature, 
        lower_bound, 
        upper_bound, 
        step_size
    ):
    """
    Band the continuous feature into discrete intervals.

    Args:
        df (pl.DataFrame): The input DataFrame.
        feature (str): The name of the continuous feature to be banded.
        lower_bound (float): The lower bound of the feature.
        upper_bound (float): The upper bound of the feature.
        step_size (float): The size of each band.

    Returns:
        pl.DataFrame: The DataFrame with the banded feature.
    """
    df = (
        df
        .with_columns(
            pl.col(feature).clip(lower_bound=lower_bound, upper_bound=upper_bound).alias(feature)
        )
        .with_columns(
            ((pl.col(feature)/step_size).floor()*step_size).alias(feature)
        )
    )    

    return df

The next function is the aggregation, which checks if the feature is continous, and will band it beforehand if so.

def aggregate_frequency_df(
        df, 
        feature, 
        continous_feature_config, 
        claim_count = 'ClaimCount', 
        exposure = 'Exposure', 
        prediction = 'ClaimCountPrediction'
    ):

    """
    Aggregate frequency data by the specified feature.

    If continous, bands feature using provided configuration.

    Args:
        df (pl.DataFrame): The input DataFrame.
        feature (str): The name of the feature to aggregate by.
        continous_feature_config (dict): Configuration for continuous features.
        claim_count (str): The name of the claim count column.
        exposure (str): The name of the exposure column.
        prediction (str): The name of the prediction column.

    Returns:
        pl.DataFrame: The aggregated frequency DataFrame.
    """

    if feature in continous_feature_config:
        lower_bound = continous_feature_config.get(feature).get('min')
        upper_bound = continous_feature_config.get(feature).get('max')
        step_size = continous_feature_config.get(feature).get('step')

        df = band_continous(df, feature, lower_bound, upper_bound, step_size)

    aggregated_df = (
        df
        .group_by(feature)
            .agg(
                pl.col(exposure).sum(), 
                pl.col(claim_count).sum(), 
                pl.col(prediction).sum()
            )
        .with_columns(Frequency=pl.col(claim_count)/pl.col(exposure))
        .with_columns(FrequencyPrediction=pl.col(prediction)/pl.col(exposure))
        .sort(feature)
    )

    return aggregated_df

Creating a table

from great_tables import GT

def create_frequency_table(
    df, 
    feature, 
    frequency = 'Frequency', 
    exposure = 'Exposure', 
    prediction = 'FrequencyPrediction', 
    experiment = 'Not Logged'
    ):

    return (
        GT(df)
        .tab_header(
            title=f'Frequency - Actual vs Predicted - {factor}',
        )
        .tab_stub(rowname_col=factor)
        .tab_source_note(source_note=f'Trained on experiment: {experiment}')
        .tab_stubhead(label=factor)
        .fmt_integer(columns=exposure)
        .fmt_percent(columns=[frequency, prediction], decimals=1)
        .data_color(
            columns=[frequency, prediction],
            palette=["#63BE7B", "#FFEB84", "#F8696B"]
        )
        .cols_move(columns=prediction, after=frequency)
    )

The frequency tables can be easily created with the above functions.

feature = 'Area'
aggregated_data = aggregate_frequency_df(holdout, feature, continous_feature_config)
create_frequency_table(aggregated_data, feature)

This can be easily applied to other features:

feature = 'DrivAge'
aggregated_data = aggregate_frequency_df(holdout, feature, continous_feature_config)
create_frequency_table(aggregated_data, feature)

To create a table for every feature, the functions can be placed in a list.

for feature in features:
    aggregated_data = aggregate_frequency_df(holdout, feature, continous_feature_config)
    create_frequency_table(aggregated_data, feature)

This technique is utilised in the PDF section.