# Performance

## What to do when `marginaleffects` is slow?

Some options:

1. Compute marginal effects and contrasts at the mean (or other representative value) instead of all observed rows of the original dataset: Use the `newdata` argument and the `datagrid()` function.
2. Compute marginal effects for a subset of variables, paying special attention to exclude factor variables which can be particularly costly to process: Use the `variables` argument.
3. Do not compute standard errors: Use the `vcov = FALSE` argument.
4. Use parallel processing to speed up the computation of standard errors. See next section.

This simulation illustrates how computation time varies for a model with 25 regressors and 100,000 observations:

``````library(marginaleffects)

## simulate data and fit a large model
N <- 1e5
dat <- data.frame(matrix(rnorm(N * 26), ncol = 26))
mod <- lm(X1 ~ ., dat)

results <- bench::mark(
# marginal effects at the mean; no standard error
slopes(mod, vcov = FALSE, newdata = "mean"),
# marginal effects at the mean
slopes(mod, newdata = "mean"),
# 1 variable; no standard error
slopes(mod, vcov = FALSE, variables = "X3"),
# 1 variable
slopes(mod, variables = "X3"),
# 26 variables; no standard error
slopes(mod, vcov = FALSE),
# 26 variables
slopes(mod),
iterations = 1, check = FALSE)

results[, c(1, 3, 5)]
# expression                                        median mem_alloc
# "slopes(mod, vcov = FALSE, newdata = \"mean\")" 194.98ms  306.19MB
# "slopes(mod, newdata = \"mean\")"               345.38ms  311.45MB
# "slopes(mod, vcov = FALSE, variables = \"X3\")" 197.51ms   649.6MB
# "slopes(mod, variables = \"X3\")"               742.05ms    1.27GB
# "slopes(mod, vcov = FALSE)"                        4.09s   13.87GB
# "slopes(mod)"                                     15.33s   26.83GB``````

The benchmarks above were conducted using the development version of `marginaleffects` on 2023-12-09.

## Parallel computation

As noted above, the most costly operation in `marginaleffects`, because that involves calling `predict()` at least twice for every coefficient in the model. This operation can be conducted in parallel to speed things up.

However, when the dataset is very large, there can be considerable cost to passing it between different cores or forked processes. Unfortunately, this means that the range of cases where parallelization is beneficial is pretty small, and that the gains will generally not be proportional to the number of cores used.

The class of models where parallelization is likely to yield the most gains is where:

1. The model includes many parameters (see `get_coef(model)`)
2. The data is not very large.

In this example, we use the `future` package to specify a parallization plan and compute standard errors in parallel. The key parts of that example are: (a) set a global option to tell `marginaleffects` that we want to compute standard errors in parallel, and (b) use `future` to specify the parallelization plan and number of workers.

``````library(mgcv)
library(tictoc)
library(future)
library(nycflights13)
library(marginaleffects)
data("flights")
packageVersion("marginaleffects")
#> [1] '0.22.0.1'

cores <- 8
plan(multicore, workers = cores, number_of_workers = 8)

flights <- flights |>
transform(date = as.Date(paste(year, month, day, sep = "/"))) |>
transform(date.num = as.numeric(date - min(date))) |>
transform(wday = as.POSIXlt(date)\$wday) |>
transform(time = as.POSIXct(paste(hour, minute, sep = ":"), format = "%H:%M")) |>
transform(time.dt = difftime(time, as.POSIXct('00:00', format = '%H:%M'), units = 'min')) |>
transform(time.num = as.numeric(time.dt)) |>
transform(dep_delay = ifelse(dep_delay < 0, 0, dep_delay)) |>
transform(dep_delay = ifelse(is.na(dep_delay), 0, dep_delay)) |>
transform(carrier = factor(carrier)) |>
transform(dest = factor(dest)) |>
transform(origin = factor(origin))

model <- bam(dep_delay ~ s(date.num, bs = "cr") +
s(wday, bs = "cc", k = 3) +
s(time.num, bs = "cr") +
s(carrier, bs = "re") +
origin +
s(distance, bs = "cr") +
s(dest, bs = "re"),
data = flights,
family = poisson,
discrete = TRUE,

Note that this is a good use-case, because the model in question has a lot of parameters:

``````length(coef(model))
#> [1] 153``````

No standard errors is very fast:

``````tic()
p1 <- predictions(model, vcov = FALSE)
toc()
#> 0.258 sec elapsed``````

With parallelization:

``````options("marginaleffects_parallel" = TRUE)

tic()
p1 <- predictions(model)
toc()
#> 7.722 sec elapsed``````

Without parallelization:

``````options("marginaleffects_parallel" = FALSE)

tic()
p2 <- predictions(model)
toc()
#> 24.799 sec elapsed``````

Now we make sure the results are equivalent:

``````cor(p1\$estimate, p2\$estimate)
#> [1] 1

cor(p1\$std.error, p2\$std.error)
#> [1] 1

#>
#>  Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
#>      2.48     0.0224 110.5   <0.001 Inf  2.43   2.52
#>      2.22     0.0176 125.8   <0.001 Inf  2.18   2.25
#>      2.16     0.0147 147.2   <0.001 Inf  2.13   2.19
#>      2.52     0.0272  92.7   <0.001 Inf  2.47   2.57
#>      2.37     0.0126 187.6   <0.001 Inf  2.35   2.40
#>      3.10     0.0170 182.1   <0.001 Inf  3.07   3.14
#>
#> Type:  response
#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, dep_delay, date.num, wday, time.num, carrier, origin, distance, dest

#>
#>  Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
#>      2.48     0.0224 110.5   <0.001 Inf  2.43   2.52
#>      2.22     0.0176 125.8   <0.001 Inf  2.18   2.25
#>      2.16     0.0147 147.2   <0.001 Inf  2.13   2.19
#>      2.52     0.0272  92.7   <0.001 Inf  2.47   2.57
#>      2.37     0.0126 187.6   <0.001 Inf  2.35   2.40
#>      3.10     0.0170 182.1   <0.001 Inf  3.07   3.14
#>
#> Type:  response
#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, dep_delay, date.num, wday, time.num, carrier, origin, distance, dest``````

The gains are interesting,

## Speed comparison

The `slopes` function is relatively fast. This simulation was conducted using the development version of the package on 2023-12-09:

``````library(margins)

N <- 1e3
dat <- data.frame(
y = sample(0:1, N, replace = TRUE),
x1 = rnorm(N),
x2 = rnorm(N),
x3 = rnorm(N),
x4 = factor(sample(letters[1:5], N, replace = TRUE)))
mod <- glm(y ~ x1 + x2 + x3 + x4, data = dat, family = binomial)``````

`marginaleffects` can be 3 times faster and use 3 times less memory than `margins` when unit-level standard errors are not computed:

``````results <- bench::mark(
slopes(mod, vcov = FALSE),
margins(mod, unit_ses = FALSE),
check = FALSE, relative = TRUE)
results[, c(1, 3, 5)]

# expression                     median mem_alloc
# <bch:expr>                      <dbl>     <dbl>
# slopes(mod, vcov = FALSE)        1         1
# margins(mod, unit_ses = FALSE)   3.21      2.83``````

`marginaleffects` can be up to 1000x times faster and use 32x less memory than `margins` when unit-level standard errors are computed:

``````results <- bench::mark(
slopes(mod, vcov = TRUE),
margins(mod, unit_ses = TRUE),
check = FALSE, relative = TRUE, iterations = 1)
results[, c(1, 3, 5)]
# expression                    median mem_alloc
#  <bch:expr>                     <dbl>     <dbl>
#  slopes(mod, vcov = TRUE)          1        1
#  margins(mod, unit_ses = TRUE)  1161.      32.9``````

Models estimated on larger datasets (> 1000 observations) can be impossible to process using the `margins` package, because of memory and time constraints. In contrast, `marginaleffects` can work well on much larger datasets.

Note that, in some specific cases, `marginaleffects` will be considerably slower than packages like `emmeans` or `modmarg`. This is because these packages make extensive use of hard-coded analytical derivatives, or reimplement their own fast prediction functions.