library(marginaleffects)
mod <- glm(am ~ vs + scale(drat), family = binomial, mtcars)
avg_comparisons(mod,
variables = "vs",
comparison = "lnoravg",
type = "response")
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -0.901 0.368 -2.45 0.0143 6.1 -1.62 -0.18
#>
#> Term: vs
#> Type: response
#> Comparison: ln(odds(1) / odds(0))
35 FAQ
35.1 Questions & Answers
- Error: Matrix columns are not supported
- Wiggly (non-smooth) confidence intervals in plots.
plot_predictions()
over a range of unobserved values- Plot the marginal effects from a
plm
package model - Models with demeaned, polynomials, or transformed variables
nlme::lme
problem with character predictors- Controlling facet rows and columns in
plot_*()
functions. - Bayesian diagnostics (Rhat, etc.) with
brms
- Pooling predictions with multiple imputation
35.2 Calling marginaleffects
in functions, loops, environments, or after re-assigning variables
Functions from the marginaleffects
package can sometimes fail when they are called inside a function, loop, or other environments. To see why, it is important to know that marginaleffects
often needs to operate on the original data that was used to fit the model. To extract this original data, we use the get_data()
function from the insight
package.
In most cases, get_data()
can extract the data which is stored inside the model object created by the modeling package. However, some modeling packages do not save the original data in the model object (in order to save memory). In those cases, get_data()
will parse the call to find the name of the data object, and will search for that data object in the global environment. When users fit models in a different environment (e.g., function calls), get_data()
may not be able to retrieve the original data.
A related problem can arise if users fit a model, but then assign a new value to the variable that used to store the dataset.
Recommendations:
- Supply your dataset explicitly to the
newdata
argument ofslopes
functions. - Avoid assigning a new value to a variable that you use to store a dataset for model fitting.
Another alternative is to re-assign relevant variables explicitly inside the function, to ensure that marginaleffects
gets access to them. For instance,
get_int_eff = function(formula, data, boot = FALSE){
assign("formula", formula, envir = parent.frame())
mod = glm(formula = formula, data = data)
int_eff = avg_comparisons(
model = mod,
variables = "am",
type = "response")
if (boot){
int_eff = inferences(int_eff, method = "boot", R = 100, conf_type = "perc")
}
return(int_eff)
}
35.3 Equivalence between avg_comparisons()
and avg_predictions()
Say we estimate a simple logit regression and want to compute the average log-odds ratio associated with a change from 0 to 1 on the vs
variable. We can proceed as follows using the comparisons()
function:
We can specify the function to compare “hi” predictions to “lo” predictions manually and get the same results:
fun <- function(hi, lo) log((mean(hi) / (1 - mean(hi))) / (mean(lo) / (1 - mean(lo))))
comparisons(mod,
variables = "vs",
comparison = fun,
type = "response")
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -0.901 0.368 -2.45 0.0143 6.1 -1.62 -0.18
#>
#> Term: vs
#> Type: response
#> Comparison: 1, 0
The same results can be obtained using the predictions()
function. First, we replicate the entire dataset, and substitute the values of vs
. Then, we make predictions. Finally, we compute the log odds ratios:
dat0 <- transform(mtcars, vs = 0)
dat1 <- transform(mtcars, vs = 1)
p0 <- predictions(mod, newdata = dat0, type = "response")
p1 <- predictions(mod, newdata = dat1, type = "response")
fun(p1$estimate, p0$estimate)
#> [1] -0.9006949
Notice that the following command gives us a different result. This is because instead of duplicating the full dataset, and computing the “hi” and “lo” as the mean predictions over the full data, we only compute the averages on the subsets of rows where vs
is 0 or 1, respectively:
p <- avg_predictions(mod, by = "vs", type = "response")
fun(p$estimate[2], p$estimate[1])
#> [1] 0.6931472
Which is equivalent to:
dat0 <- subset(mtcars, vs == 0)
dat1 <- subset(mtcars, vs == 1)
p0 <- predictions(mod, newdata = dat0, type = "response")
p1 <- predictions(mod, newdata = dat1, type = "response")
fun(p1$estimate, p0$estimate)
#> [1] 0.6931472
35.4 Dates
Date predictors are not supported by default in marginaleffects
. There are two main reasons for that. First, many quantities of interest are only formally defined for numeric variables (ex: derivatives). Second, an object like “2024-11-04” does not have a natural arithmetic. What does it mean to take a sum like: \(\pi\) + “2024-11-04”? In that spirit, marginaleffects
requires that the user transforms the date variable into a numeric variable before fitting the model. This makes operations more explicit.
In this example, we fit a model with a date and another with a integer version of the date. We call predict()
to illustrate the fact that the two models yield exactly identical predictions:
library(lubridate)
library(marginaleffects)
set.seed(123)
N <- 100
Date <- ymd("2024-01-01") + days(sample(0:30, N, replace = TRUE))
Days <- as.integer(Date)
Response <- pi * as.integer(Date) + rnorm(N)
mod <- lm(Response ~ Date)
predict(mod) |> head()
#> 1 2 3 4 5 6
#> 62055.86 62005.69 62018.23 62002.55 61968.06 61990.01
avg_slopes(mod)
#> Error: Class of the `Date` variable is class is not supported.
mod <- lm(Response ~ Days)
predict(mod) |> head()
#> 1 2 3 4 5 6
#> 62055.86 62005.69 62018.23 62002.55 61968.06 61990.01
avg_slopes(mod)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 3.14 0.0152 207 <0.001 Inf 3.11 3.17
#>
#> Term: Days
#> Type: response
#> Comparison: dY/dX
35.5 More plot customization
As described in the Plot vignette, there are currently three ways to customize plots in marginaleffects
:
- Using the arguments provided by
plot_*()
functions in themarginaleffects
package. - Modifying the plot objects using
ggplot2
or an extension package likeggExtra
. - Setting
draw=FALSE
to extract the plotting data, and then feeding that data to a standalone plotting software likeggplot2
,plot
, orlattice
.
Options 2 and 3 are extremely flexible, and make it trivial to reproduce the default plots or to draw highly customized visualizations.
Of course, some users might still prefer to use function arguments directly available in marginaleffects::plot_*()
, and the unavailability of some arguments can be frustrating. Nevertheless, a decision was made to keep the number of arguments in plot_*()
functions quite low. This is because ggplot2
functions like geom_*()
, facet_*()
, and aes()
each accept a very large number of options, and because users’ needs are incredibly varied. This makes it difficult to accomodate all needs and possibilities, without developing unwieldy functions with dozens of arguments.1
Thus, the marginaleffects
package is designed with limited scope, focusing on computation while offering a few visualization options for quick exploration. Users who need highly customized or publication-ready figures are encouraged to use the draw=FALSE
argument with standalone plotting software.
In an ideal world, someone else would build a richer visualization package on top of marginaleffects
. I think this would be a very cool project, and I would be happy to collaborate, advise, and help if I can. In the meantime, feel free to keep the discussion going by opening an issue on Github. I will be happy to consider counter-arguments, suggestions, or pull requests.