```
library(marginaleffects)
<- mtcars
dat $cyl <- as.factor(dat$cyl)
dat$am <- as.logical(dat$am)
dat<- lm(mpg ~ hp + cyl + am, data = dat) mod
```

# Marginal Means

In the context of this package, “marginal means” refer to the values obtained by this three step process:

- Construct a “grid” of predictor values with all combinations of categorical variables, and where numeric variables are held at their means.
- Calculate adjusted predictions for each cell in that grid.
- Take the average of those adjusted predictions across one dimension of the grid to obtain the marginal means.

For example, consider a model with a numeric, a factor, and a logical predictor:

Using the `predictions`

function, we set the `hp`

variable at its mean and compute predictions for all combinations for `am`

and `cyl`

:

```
<- predictions(
p
mod,newdata = datagrid(am = unique, cyl = unique))
```

For illustration purposes, it is useful to reshape the above results:

cyl | FALSE | TRUE | Marginal means by cyl |
---|---|---|---|

4 | 20.8 | 25.0 | 22.9 |

6 | 16.9 | 21.0 | 19.0 |

8 | 17.3 | 21.4 | 19.4 |

Marginal means by am | 22.5 | 18.3 |

The marginal means by `am`

and `cyl`

are obtained by taking the mean of the adjusted predictions across cells. We can achieve the same results with the `predictions()`

function, with `datagrid()`

to specify a balanced grid, and the `by`

argument to define the marginalization dimension:

```
predictions(
mod,by = "am",
newdata = datagrid(am = unique, cyl = unique))
#>
#> am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> FALSE 18.3 0.785 23.3 <0.001 397.4 16.8 19.9
#> TRUE 22.5 0.834 26.9 <0.001 528.6 20.8 24.1
#>
#> Columns: am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
```

Alternatively, we can use the convenient `grid_type="balanced"`

argument:

```
predictions(
mod,by = "cyl",
newdata = datagrid(grid_type = "balanced"))
#>
#> cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 4 22.9 1.36 16.9 <0.001 209.7 20.2 25.5
#> 6 19.0 1.07 17.7 <0.001 229.7 16.9 21.1
#> 8 19.4 1.38 14.1 <0.001 146.6 16.7 22.1
#>
#> Columns: cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
```

We can of course marginalize over the interaction between more variables, by changing the `by`

argument:

```
predictions(
mod,by = c("am", "cyl"),
newdata = datagrid(grid_type = "balanced"))
#>
#> am cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp
#> FALSE 4 20.8 1.76 11.8 <0.001 105.1 17.4 24.2 147
#> FALSE 6 16.9 1.27 13.3 <0.001 130.9 14.4 19.4 147
#> FALSE 8 17.3 1.12 15.5 <0.001 176.8 15.1 19.5 147
#> TRUE 4 25.0 1.18 21.2 <0.001 329.3 22.7 27.3 147
#> TRUE 6 21.0 1.21 17.3 <0.001 221.4 18.7 23.4 147
#> TRUE 8 21.4 1.83 11.7 <0.001 103.2 17.9 25.0 147
#>
#> Columns: rowid, am, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp
#> Type: response
```

The same results can be achieved using the powerful `emmeans`

package:

```
library(emmeans)
emmeans(mod, specs = ~cyl)
#> cyl emmean SE df lower.CL upper.CL
#> 4 22.9 1.36 27 20.1 25.7
#> 6 19.0 1.07 27 16.8 21.2
#> 8 19.4 1.38 27 16.5 22.2
#>
#> Results are averaged over the levels of: am
#> Confidence level used: 0.95
emmeans(mod, specs = ~cyl + am)
#> cyl am emmean SE df lower.CL upper.CL
#> 4 FALSE 20.8 1.76 27 17.2 24.4
#> 6 FALSE 16.9 1.27 27 14.3 19.5
#> 8 FALSE 17.3 1.12 27 15.0 19.6
#> 4 TRUE 25.0 1.18 27 22.5 27.4
#> 6 TRUE 21.0 1.21 27 18.6 23.5
#> 8 TRUE 21.4 1.83 27 17.7 25.2
#>
#> Confidence level used: 0.95
```

## Marginal Means vs. Average Predictions

What should scientists report? Marginal means or average predictions?

Many analysts ask this question, but unfortunately there isn’t a single answer. As explained above, marginal means are a special case of predictions, made on a perfectly balanced grid of categorical predictors, with numeric predictors held at their means, and marginalized with respect to some focal variables. Whether the analyst prefers to report this specific type of marginal means or another kind of average prediction will depend on the characteristics of the sample and the population to which they want to generalize.

After reading this vignette and the discussion of `emmeans`

in the Alternative Software vignette, you may want to consult with a statistician to discuss your specific real-world problem and make an informed choice.

## Plot conditional marginal means

The `marginaleffects`

package offers several functions to plot how some quantities vary as a function of others:

`plot_predictions`

: Conditional adjusted predictions – how does the predicted outcome change as a function of regressors?`plot_comparisons`

: Conditional comparisons – how do contrasts change as a function of regressors?`plot_slopes`

: Conditional marginal effects – how does the slope change as a function of regressors?

There is no analogous function for marginal means. However, it is very easy to achieve a similar effect using the `predictions()`

function, its `by`

argument, and standard plotting functions. In the example below, we take these steps:

- Estimate a model with one continuous (
`hp`

) and one categorical regressor (`cyl`

). - Create a perfectly “balanced” data grid for each combination of
`hp`

and`cyl`

. This is specified by the user in the`datagrid()`

call. - Compute fitted values (aka “adjusted predictions”) for each cell of the grid.
- Use the
`by`

argument to take the average of predicted values for each value of`hp`

, across margins of`cyl`

. - Compute standard errors around the averaged predicted values (i.e., marginal means).
- Create symmetric confidence intervals in the usual manner.
- Plot the results.

```
library(ggplot2)
<- lm(mpg ~ hp + factor(cyl), data = mtcars)
mod
<- predictions(mod,
p by = "hp",
newdata = datagrid(
model = mod,
hp = seq(100, 120, length.out = 10),
cyl = mtcars$cyl))
ggplot(p) +
geom_ribbon(aes(hp, ymin = conf.low, ymax = conf.high), alpha = .2) +
geom_line(aes(hp, estimate))
```