Tuning-Carryover-and-Saturation-Hyperparameters • tidymmm

Tunable Recipe

A big advantage of using tydymmm and a major motivation for following the tydymodels framework is the set of tools for hyperparameter tuning. Seting up a tunable model is made easy using the tune function. For each parameter that we want to tune we instead of hardcoding a value we set to tune(name_of_the_parameter). Make sure that the names are distinct.

m_recipe <-
  recipe(kpi_sales ~ ., data = mmm_imps) |>
  add_role(c(mi_tv, mi_radio, mi_banners), new_role = "mi") |>
  update_role(Date, new_role = "temp") %>%
  update_role_requirements("temp", bake = FALSE) |>
  
  #here we will tune the decay rate for banner but hard code max_carryover to 1 i.e. no carryover
  step_geometric_adstock(mi_banners, decay = tune("banners_decay"), max_carryover = 1) |>
  
  # we will also tune the shape parameter of the hill saturation function
  step_hill_saturation(mi_banners, shape = tune("banners_shape"), max_ref = TRUE) |>
  
  # for tv we will also tune the max carryover
  step_geometric_adstock(mi_tv, decay = tune("tv_decay"), max_carryover = tune("tv_max_carryover")) |>
  step_hill_saturation(mi_tv, shape = tune("tv_shape"), max_ref = TRUE) |>
  step_geometric_adstock(mi_radio, decay = tune("radio_decay"), max_carryover = tune("radio_max_carryover")) |>
  step_hill_saturation(mi_radio, shape = tune("radio_shape"), max_ref = TRUE)

As a modeler and depending on contextual knowledge you may tune some or all the hyper-parameters of the model. In the example below we set a string assumption that impact on sale from banner ads has no carryover into the future. TV and Radio carryover will be tuned.

We define the model and workflow with the recipe in the usual manner.

m_mod <-
  linear_reg() |>
  set_engine("lm")

m_wflow <-
  workflow() |>
  add_model(m_mod) |>
  add_recipe(m_recipe)

Define a Tuning Grid

This is where we begin to really tap into the tydymeodels framework. To define a grid of parameters over which we will search for an optimal combination we will follow define a random grid over a ranges of hyperparameters. We’ll use the grid_random function and metric functions provided by tydymmm, shape, max_carryover, decay. These functions can take a range parameter to set the lower and upper limits. Setting a range like this essentially allows for your bias or business domain knowledge to come into play. If an range is not provided a search over the whole domain of the function will be preformed.

set.seed(007)

rand_grid <-
  grid_random(
    decay(range = c(0.01, 0.5)),
    tv_shape = shape(range = c(0.01, 0.9)),
    tv_max_carryover = max_carryover(range = c(2 , 8)),
    banners_decay = decay(range = c(0.01, 0.5)),
    banners_shape = shape(range = c(1, 2)),
    radio_decay = decay(range = c(0.01, 0.4)),
    radio_shape = shape(range = c(1, 2)),
    radio_max_carryover = max_carryover(range = c(1 , 3)),
    size = 10
  ) %>%
  rename(tv_decay = decay)

For each of the hyperparameters we create a column with the name matching the recipe. To make the search a little quicker we set the max grid size to 1000.

rand_grid
#> # A tibble: 10 × 8
#>    tv_decay tv_shape tv_max_carryover banners_decay banners_shape radio_decay
#>       <dbl>    <dbl>            <int>         <dbl>         <dbl>       <dbl>
#>  1   0.495    0.163                 4         0.101          1.76      0.312 
#>  2   0.205    0.216                 5         0.101          1.44      0.255 
#>  3   0.0667   0.698                 3         0.196          1.90      0.292 
#>  4   0.0442   0.0957                8         0.425          1.32      0.161 
#>  5   0.129    0.414                 8         0.254          1.08      0.0735
#>  6   0.398    0.0854                7         0.397          1.82      0.0830
#>  7   0.177    0.509                 7         0.421          1.90      0.163 
#>  8   0.486    0.0177                4         0.234          1.97      0.117 
#>  9   0.0913   0.887                 6         0.402          1.57      0.0848
#> 10   0.235    0.292                 5         0.197          1.72      0.207 
#> # ℹ 2 more variables: radio_shape <dbl>, radio_max_carryover <int>

Tune

Next we’ll set a cross-validation scheme with 5 folds and two repetitions.

folds <- vfold_cv(mmm, v = 5, repeats = 2)

Now we are ready to tune! Pass the workflow to tune_grid with the resamples, the grid, and a set of metrics to optimize for. We will search for an optimal MAPE.

tuned <- 
  m_wflow %>% 
  tune_grid(
    resamples = folds, 
    grid = rand_grid, 
    metrics = metric_set(mape)
    )

To see the results we’ll unwrap the metrics column for each of the models and take the average over the folds and repetitions of our tuning cv scheme.

tn_summary <- 
  tuned %>%
  unnest(.metrics) %>%
  group_by(
    tv_max_carryover,
    tv_decay,
    tv_shape,
    banners_decay,
    banners_shape,
    radio_max_carryover,
    radio_decay,
    radio_shape
    ) %>%
  summarise(mape = mean(.estimate)) %>%
  arrange(mape)

tn_summary
#> # A tibble: 10 × 9
#> # Groups:   tv_max_carryover, tv_decay, tv_shape, banners_decay, banners_shape,
#> #   radio_max_carryover, radio_decay [10]
#>    tv_max_carryover tv_decay tv_shape banners_decay banners_shape
#>               <int>    <dbl>    <dbl>         <dbl>         <dbl>
#>  1                6   0.0913   0.887          0.402          1.57
#>  2                3   0.0667   0.698          0.196          1.90
#>  3                7   0.177    0.509          0.421          1.90
#>  4                8   0.129    0.414          0.254          1.08
#>  5                5   0.235    0.292          0.197          1.72
#>  6                5   0.205    0.216          0.101          1.44
#>  7                8   0.0442   0.0957         0.425          1.32
#>  8                4   0.495    0.163          0.101          1.76
#>  9                4   0.486    0.0177         0.234          1.97
#> 10                7   0.398    0.0854         0.397          1.82
#> # ℹ 4 more variables: radio_max_carryover <int>, radio_decay <dbl>,
#> #   radio_shape <dbl>, mape <dbl>

Plot average MAPE over a grid of two variables

tn_summary %>%
  ggplot(aes(tv_decay, tv_shape)) +
  geom_point(aes(color = mape, size = mape)) +
  theme_minimal()