35  FAQ

35.1 Questions & Answers

35.2 Calling marginaleffects in functions, loops, environments, or after re-assigning variables

Functions from the marginaleffects package can sometimes fail when they are called inside a function, loop, or other environments. To see why, it is important to know that marginaleffects often needs to operate on the original data that was used to fit the model. To extract this original data, we use the get_data() function from the insight package.

In most cases, get_data() can extract the data which is stored inside the model object created by the modeling package. However, some modeling packages do not save the original data in the model object (in order to save memory). In those cases, get_data() will parse the call to find the name of the data object, and will search for that data object in the global environment. When users fit models in a different environment (e.g., function calls), get_data() may not be able to retrieve the original data.

A related problem can arise if users fit a model, but then assign a new value to the variable that used to store the dataset.

Recommendations:

  1. Supply your dataset explicitly to the newdata argument of slopes functions.
  2. Avoid assigning a new value to a variable that you use to store a dataset for model fitting.

Another alternative is to re-assign relevant variables explicitly inside the function, to ensure that marginaleffects gets access to them. For instance,

get_int_eff = function(formula, data, boot = FALSE){
  assign("formula", formula, envir = parent.frame())
  mod = glm(formula = formula, data = data)
  int_eff = avg_comparisons(
    model = mod,
    variables = "am",
    type = "response")
  if (boot){
    int_eff = inferences(int_eff, method = "boot", R = 100, conf_type = "perc")
  }
  return(int_eff)
}

35.3 Equivalence between avg_comparisons() and avg_predictions()

Say we estimate a simple logit regression and want to compute the average log-odds ratio associated with a change from 0 to 1 on the vs variable. We can proceed as follows using the comparisons() function:

library(marginaleffects)
mod <- glm(am ~ vs + scale(drat), family = binomial, mtcars)

avg_comparisons(mod,
    variables = "vs",
    comparison = "lnoravg",
    type = "response")
#> 
#>  Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
#>    -0.901      0.368 -2.45   0.0143 6.1 -1.62  -0.18
#> 
#> Term: vs
#> Type:  response 
#> Comparison: ln(odds(1) / odds(0))

We can specify the function to compare “hi” predictions to “lo” predictions manually and get the same results:

fun <- function(hi, lo) log((mean(hi) / (1 - mean(hi))) / (mean(lo) / (1 - mean(lo))))

comparisons(mod,
    variables = "vs",
    comparison = fun,
    type = "response")
#> 
#>  Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
#>    -0.901      0.368 -2.45   0.0143 6.1 -1.62  -0.18
#> 
#> Term: vs
#> Type:  response 
#> Comparison: 1, 0

The same results can be obtained using the predictions() function. First, we replicate the entire dataset, and substitute the values of vs. Then, we make predictions. Finally, we compute the log odds ratios:

dat0 <- transform(mtcars, vs = 0)
dat1 <- transform(mtcars, vs = 1)

p0 <- predictions(mod, newdata = dat0, type = "response")
p1 <- predictions(mod, newdata = dat1, type = "response")

fun(p1$estimate, p0$estimate)
#> [1] -0.9006949

Notice that the following command gives us a different result. This is because instead of duplicating the full dataset, and computing the “hi” and “lo” as the mean predictions over the full data, we only compute the averages on the subsets of rows where vs is 0 or 1, respectively:

p <- avg_predictions(mod, by = "vs", type = "response")
fun(p$estimate[2], p$estimate[1])
#> [1] 0.6931472

Which is equivalent to:

dat0 <- subset(mtcars, vs == 0)
dat1 <- subset(mtcars, vs == 1)

p0 <- predictions(mod, newdata = dat0, type = "response")
p1 <- predictions(mod, newdata = dat1, type = "response")

fun(p1$estimate, p0$estimate)
#> [1] 0.6931472

35.4 Dates

Date predictors are not supported by default in marginaleffects. There are two main reasons for that. First, many quantities of interest are only formally defined for numeric variables (ex: derivatives). Second, an object like “2024-11-04” does not have a natural arithmetic. What does it mean to take a sum like: \(\pi\) + “2024-11-04”? In that spirit, marginaleffects requires that the user transforms the date variable into a numeric variable before fitting the model. This makes operations more explicit.

In this example, we fit a model with a date and another with a integer version of the date. We call predict() to illustrate the fact that the two models yield exactly identical predictions:

library(lubridate)
library(marginaleffects)
set.seed(123)

N <- 100
Date <- ymd("2024-01-01") + days(sample(0:30, N, replace = TRUE))
Days <- as.integer(Date)
Response <- pi * as.integer(Date) + rnorm(N)

mod <- lm(Response ~ Date)
predict(mod) |> head()
#>        1        2        3        4        5        6 
#> 62055.86 62005.69 62018.23 62002.55 61968.06 61990.01
avg_slopes(mod)
#> Error: Class of the `Date` variable is class is not supported.

mod <- lm(Response ~ Days)
predict(mod) |> head()
#>        1        2        3        4        5        6 
#> 62055.86 62005.69 62018.23 62002.55 61968.06 61990.01
avg_slopes(mod)
#> 
#>  Estimate Std. Error   z Pr(>|z|)   S 2.5 % 97.5 %
#>      3.14     0.0152 207   <0.001 Inf  3.11   3.17
#> 
#> Term: Days
#> Type:  response 
#> Comparison: dY/dX

35.5 More plot customization

As described in the Plot vignette, there are currently three ways to customize plots in marginaleffects:

  1. Using the arguments provided by plot_*() functions in the marginaleffects package.
  2. Modifying the plot objects using ggplot2 or an extension package like ggExtra.
  3. Setting draw=FALSE to extract the plotting data, and then feeding that data to a standalone plotting software like ggplot2, plot, or lattice.

Options 2 and 3 are extremely flexible, and make it trivial to reproduce the default plots or to draw highly customized visualizations.

Of course, some users might still prefer to use function arguments directly available in marginaleffects::plot_*(), and the unavailability of some arguments can be frustrating. Nevertheless, a decision was made to keep the number of arguments in plot_*() functions quite low. This is because ggplot2 functions like geom_*(), facet_*(), and aes() each accept a very large number of options, and because users’ needs are incredibly varied. This makes it difficult to accomodate all needs and possibilities, without developing unwieldy functions with dozens of arguments.1

Thus, the marginaleffects package is designed with limited scope, focusing on computation while offering a few visualization options for quick exploration. Users who need highly customized or publication-ready figures are encouraged to use the draw=FALSE argument with standalone plotting software.

In an ideal world, someone else would build a richer visualization package on top of marginaleffects. I think this would be a very cool project, and I would be happy to collaborate, advise, and help if I can. In the meantime, feel free to keep the discussion going by opening an issue on Github. I will be happy to consider counter-arguments, suggestions, or pull requests.


  1. Note that we cannot pass visual arguments to ..., because that mechanism is already used to pass arguments to the predict() methods, which is required for some model types.↩︎