# Marginal Means

In the context of this package, “marginal means” refer to the values obtained by this three step process:

1. Construct a “grid” of predictor values with all combinations of categorical variables, and where numeric variables are held at their means.
2. Calculate adjusted predictions for each cell in that grid.
3. Take the average of those adjusted predictions across one dimension of the grid to obtain the marginal means.

For example, consider a model with a numeric, a factor, and a logical predictor:

``````library(marginaleffects)

dat <- mtcars
dat\$cyl <- as.factor(dat\$cyl)
dat\$am <- as.logical(dat\$am)
mod <- lm(mpg ~ hp + cyl + am, data = dat)``````

Using the `predictions` function, we set the `hp` variable at its mean and compute predictions for all combinations for `am` and `cyl`:

``````p <- predictions(
mod,
newdata = datagrid(am = unique, cyl = unique))``````

For illustration purposes, it is useful to reshape the above results:

am
cyl TRUE FALSE Marginal means by cyl
6 21.0 16.9 19.0
4 25.0 20.8 22.9
8 21.4 17.3 19.4
Marginal means by am 22.5 18.3

The marginal means by `am` and `cyl` are obtained by taking the mean of the adjusted predictions across cells. We can achieve the same results with the `predictions()` function, with `datagrid()` to specify a balanced grid, and the `by` argument to define the marginalization dimension:

``````predictions(
mod,
by = "am",
newdata = datagrid(am = unique, cyl = unique))
#>
#>     am Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
#>   TRUE     22.5      0.834 26.9   <0.001 528.6  20.8   24.1
#>  FALSE     18.3      0.785 23.3   <0.001 397.4  16.8   19.9
#>
#> Columns: am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type:  response``````

Alternatively, we can use the convenient `newdata="balanced"` argument:

``````predictions(
mod,
by = "cyl",
newdata = "balanced")
#>
#>  cyl Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
#>    6     19.0       1.07 17.7   <0.001 229.7  16.9   21.1
#>    4     22.9       1.36 16.9   <0.001 209.7  20.2   25.5
#>    8     19.4       1.38 14.1   <0.001 146.6  16.7   22.1
#>
#> Columns: cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type:  response``````

We can of course marginalize over the interaction between more variables, by changing the `by` argument:

``````predictions(
mod,
by = c("am", "cyl"),
newdata = "balanced")
#>
#>     am cyl Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %  hp
#>   TRUE   6     21.0       1.21 17.3   <0.001 221.4  18.7   23.4 147
#>   TRUE   4     25.0       1.18 21.2   <0.001 329.3  22.7   27.3 147
#>   TRUE   8     21.4       1.83 11.7   <0.001 103.2  17.9   25.0 147
#>  FALSE   6     16.9       1.27 13.3   <0.001 130.9  14.4   19.4 147
#>  FALSE   4     20.8       1.76 11.8   <0.001 105.1  17.4   24.2 147
#>  FALSE   8     17.3       1.12 15.5   <0.001 176.8  15.1   19.5 147
#>
#> Columns: rowid, am, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, hp, mpg
#> Type:  response``````

The same results can be achieved using the powerful `emmeans` package:

``````library(emmeans)
emmeans(mod, specs = ~cyl)
#>  cyl emmean   SE df lower.CL upper.CL
#>  4     22.9 1.36 27     20.1     25.7
#>  6     19.0 1.07 27     16.8     21.2
#>  8     19.4 1.38 27     16.5     22.2
#>
#> Results are averaged over the levels of: am
#> Confidence level used: 0.95
emmeans(mod, specs = ~cyl + am)
#>  cyl am    emmean   SE df lower.CL upper.CL
#>  4   FALSE   20.8 1.76 27     17.2     24.4
#>  6   FALSE   16.9 1.27 27     14.3     19.5
#>  8   FALSE   17.3 1.12 27     15.0     19.6
#>  4    TRUE   25.0 1.18 27     22.5     27.4
#>  6    TRUE   21.0 1.21 27     18.6     23.5
#>  8    TRUE   21.4 1.83 27     17.7     25.2
#>
#> Confidence level used: 0.95``````

## Marginal Means vs. Average Predictions

What should scientists report? Marginal means or average predictions?

Many analysts ask this question, but unfortunately there isn’t a single answer. As explained above, marginal means are a special case of predictions, made on a perfectly balanced grid of categorical predictors, with numeric predictors held at their means, and marginalized with respect to some focal variables. Whether the analyst prefers to report this specific type of marginal means or another kind of average prediction will depend on the characteristics of the sample and the population to which they want to generalize.

After reading this vignette and the discussion of `emmeans` in the Alternative Software vignette, you may want to consult with a statistician to discuss your specific real-world problem and make an informed choice.

## Plot conditional marginal means

The `marginaleffects` package offers several functions to plot how some quantities vary as a function of others:

• `plot_predictions`: Conditional adjusted predictions – how does the predicted outcome change as a function of regressors?
• `plot_comparisons`: Conditional comparisons – how do contrasts change as a function of regressors?
• `plot_slopes`: Conditional marginal effects – how does the slope change as a function of regressors?

There is no analogous function for marginal means. However, it is very easy to achieve a similar effect using the `predictions()` function, its `by` argument, and standard plotting functions. In the example below, we take these steps:

1. Estimate a model with one continuous (`hp`) and one categorical regressor (`cyl`).
2. Create a perfectly “balanced” data grid for each combination of `hp` and `cyl`. This is specified by the user in the `datagrid()` call.
3. Compute fitted values (aka “adjusted predictions”) for each cell of the grid.
4. Use the `by` argument to take the average of predicted values for each value of `hp`, across margins of `cyl`.
5. Compute standard errors around the averaged predicted values (i.e., marginal means).
6. Create symmetric confidence intervals in the usual manner.
7. Plot the results.
``````library(ggplot2)

mod <- lm(mpg ~ hp + factor(cyl), data = mtcars)

p <- predictions(mod,
by = "hp",
newdata = datagrid(
model = mod,
hp = seq(100, 120, length.out = 10),
cyl = mtcars\$cyl))

ggplot(p) +
geom_ribbon(aes(hp, ymin = conf.low, ymax = conf.high), alpha = .2) +
geom_line(aes(hp, estimate))``````