FAQ

Questions & Answers

Calling marginaleffects in functions, loops, environments, or after re-assigning variables

Functions from the marginaleffects package can sometimes fail when they are called inside a function, loop, or other environments. To see why, it is important to know that marginaleffects often needs to operate on the original data that was used to fit the model. To extract this original data, we use the get_data() function from the insight package.

In most cases, get_data() can extract the data which is stored inside the model object created by the modeling package. However, some modeling packages do not save the original data in the model object (in order to save memory). In those cases, get_data() will parse the call to find the name of the data object, and will search for that data object in the global environment. When users fit models in a different environment (e.g., function calls), get_data() may not be able to retrieve the original data.

A related problem can arise if users fit a model, but then assign a new value to the variable that used to store the dataset.

Recommendations:

  1. Supply your dataset explicitly to the newdata argument of slopes functions.
  2. Avoid assigning a new value to a variable that you use to store a dataset for model fitting.

Equivalence between avg_comparisons() and avg_predictions()

Say we estimate a simple logit regression and want to compute the average log-odds ratio associated with a change from 0 to 1 on the vs variable. We can proceed as follows using the comparisons() function:

library(marginaleffects)
mod <- glm(am ~ vs + scale(drat), family = binomial, mtcars)

avg_comparisons(mod,
    variables = "vs",
    comparison = "lnoravg",
    type = "response")
#> 
#>  Term              Contrast Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
#>    vs ln(odds(1) / odds(0))   -0.901      0.368 -2.45   0.0143 6.1 -1.62  -0.18
#> 
#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
#> Type:  response

We can specify the function to compare “hi” predictions to “lo” predictions manually and get the same results:

fun <- function(hi, lo) log((mean(hi) / (1 - mean(hi))) / (mean(lo) / (1 - mean(lo))))

comparisons(mod,
    variables = "vs",
    comparison = fun,
    type = "response")
#> 
#>  Term Contrast Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
#>    vs     1, 0   -0.901      0.368 -2.45   0.0143 6.1 -1.62  -0.18
#> 
#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
#> Type:  response

The same results can be obtained using the predictions() function. First, we replicate the entire dataset, and substitute the values of vs. Then, we make predictions. Finally, we compute the log odds ratios:

dat0 <- transform(mtcars, vs = 0)
dat1 <- transform(mtcars, vs = 1)

p0 <- predictions(mod, newdata = dat0, type = "response")
p1 <- predictions(mod, newdata = dat1, type = "response")

fun(p1$estimate, p0$estimate)
#> [1] -0.9006949

Notice that the following command gives us a different result. This is because instead of duplicating the full dataset, and computing the “hi” and “lo” as the mean predictions over the full data, we only compute the averages on the subsets of rows where vs is 0 or 1, respectively:

p <- avg_predictions(mod, by = "vs", type = "response")
fun(p$estimate[2], p$estimate[1])
#> [1] 0.6931472

Which is equivalent to:

dat0 <- subset(mtcars, vs == 0)
dat1 <- subset(mtcars, vs == 1)

p0 <- predictions(mod, newdata = dat0, type = "response")
p1 <- predictions(mod, newdata = dat1, type = "response")

fun(p1$estimate, p0$estimate)
#> [1] 0.6931472