# Hypothesis Tests

The marginaleffects package can conduct linear or non-linear hypothesis tests on the coefficients of any supported model class, or on the quantities generated by any of the other functions of the package: predictions(), comparisons(), or slopes().

There are two main entry points for hypothesis tests:

1. marginaleffects functions: Use the hypothesis argument.
2. Other R objects and models: Use the hypotheses() function.

Both the hypothesis argument and the hypotheses() function accept several input types, which allow a lot of flexibility in the specification of hypothesis tests:

1. Numeric: Value of a null hypothesis.
2. Formula: Equation of a (non-)linear hypothesis test.
3. Function: Tests on arbitrary transformations or aggregations.
4. Matrix: Contrast matrices and vectors.

This vignette shows how to use each of these strategies to conduct hypothesis tests on model coefficients or on the quantities estimated by the marginaleffects package. After reading it, you will be able to specify custom hypothesis tests and contrasts to assess statements like:

• The coefficients $$\beta_1$$ and $$\beta_2$$ are equal.
• The slope of $$Y$$ with respect to $$X_1$$ is equal to the slope with respect to $$X_2$$.
• The average treatment effect of $$X$$ when $$W=0$$ is equal to the average treatment effect of $$X$$ when $$W=1$$.
• A non-linear function of predicted values is equal to 100.
• The marginal mean in the control group is equal to the average of marginal means in the other 3 treatment arms.
• Cross-level contrasts: In a multinomial model, the effect of $$X$$ on the 1st outcome level is equal to the effect of $$X$$ on the 2nd outcome level.

# Numeric (null hypothesis)

The simplest way to modify a hypothesis test is to change the null hypothesis. By default, all functions in the marginaleffects package assume that the null is 0. This can be changed with the hypothesis argument.

## Coefficients

Consider a simple logistic regression model:

library(marginaleffects)
mod <- glm(am ~ hp + drat, data = mtcars, family = binomial)

By default, the summary() function will report the results of hypothesis tests where the null is set to 0:

summary(mod)
#>
#> Call:
#> glm(formula = am ~ hp + drat, family = binomial, data = mtcars)
#>
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -29.076080  12.416916  -2.342   0.0192 *
#> hp            0.010793   0.009328   1.157   0.2473
#> drat          7.309781   3.046597   2.399   0.0164 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#>     Null deviance: 43.230  on 31  degrees of freedom
#> Residual deviance: 20.144  on 29  degrees of freedom
#> AIC: 26.144
#>
#> Number of Fisher Scoring iterations: 7

Using hypotheses(), we can easily change the null hypothesis:

hypotheses(mod, hypothesis = 6)
tinytable_fmqhn6q0onnl23s1povz
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
(Intercept) -29.0761 12.41692 -2.82 0.00473 7.7 -53.41279 -4.7394
hp 0.0108 0.00933 -642.04 < 0.001 Inf -0.00749 0.0291
drat 7.3098 3.04660 0.43 0.66726 0.6 1.33856 13.2810

## Predictions

Changing the value of the null is particularly important in the context of predictions, where the 0 baseline may not be particularly meaningful. For example, here we compute the predicted outcome for a hypothetical unit where all regressors are fixed to their sample means:

predictions(mod, newdata = "mean")
tinytable_h838q7uqkhuu3k49dgsn
Estimate Pr(>|z|) S 2.5 % 97.5 % hp drat
Columns: rowid, estimate, p.value, s.value, conf.low, conf.high, hp, drat, am
0.231 0.135 2.9 0.0584 0.592 147 3.6

The Z statistic and p value reported above assume that the null hypothesis equals zero. We can change the null with the hypothesis argument:

predictions(mod, newdata = "mean", hypothesis = .5)
tinytable_e5r5n3r32nzbff5jei5h
Estimate Pr(>|z|) S 2.5 % 97.5 % hp drat
Columns: rowid, estimate, p.value, s.value, conf.low, conf.high, hp, drat, am
0.231 0.0343 4.9 0.0584 0.592 147 3.6

## Comparisons and slopes

When computing different quantities of interest like risk ratios, it can make sense to set the null hypothesis to 1 rather than 0:

avg_comparisons(
mod,
variables = "hp",
comparison = "ratio",
hypothesis = 1) |>
print(digits = 5)
Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
hp mean(+1) 1.0027 0.0021743 1.2515 0.21074 2.2 0.99846 1.007

# Equations

The hypotheses() function emulates the equation-based syntax of the well-established car::deltaMethod and car::linearHypothesis functions. However, marginaleffects supports more models, requires fewer dependencies, and offers some convenience features like the vcov argument for robust standard errors.

The syntax we illustrate in this section takes the form a string equation, ex: "am = 2 * vs"

## Coefficients

Let’s start by estimating a simple model:

library(marginaleffects)
mod <- lm(mpg ~ hp + wt + factor(cyl), data = mtcars)

When the FUN and hypothesis arguments of hypotheses() equal NULL (the default), the function returns a data.frame of raw estimates:

hypotheses(mod)
tinytable_nak8kt413g7infzu2my7
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
(Intercept) 35.8460 2.041 17.56 <0.001 227.0 31.8457 39.846319
hp -0.0231 0.012 -1.93 0.0531 4.2 -0.0465 0.000306
wt -3.1814 0.720 -4.42 <0.001 16.6 -4.5918 -1.771012
factor(cyl)6 -3.3590 1.402 -2.40 0.0166 5.9 -6.1062 -0.611803
factor(cyl)8 -3.1859 2.170 -1.47 0.1422 2.8 -7.4399 1.068169

Test of equality between coefficients:

hypotheses(mod, "hp = wt")
tinytable_0wrww86lq0bemqt5qdi9
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
hp = wt 3.16 0.72 4.39 <0.001 16.4 1.75 4.57

Non-linear function of coefficients

hypotheses(mod, "exp(hp + wt) = 0.1")
tinytable_rxoqcaedkru8hi7nudsj
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
exp(hp + wt) = 0.1 -0.0594 0.0292 -2.04 0.0418 4.6 -0.117 -0.0022

The vcov argument behaves in the same was as in all the other marginaleffects functions, allowing us to easily compute robust standard errors:

hypotheses(mod, "hp = wt", vcov = "HC3")
tinytable_g64d8fjiqts2wa6bp913
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
hp = wt 3.16 0.805 3.92 <0.001 13.5 1.58 4.74

We can use shortcuts like b1, b2, ... to identify the position of each parameter in the output of FUN. For example, b2=b3 is equivalent to hp=wt because those term names appear in the 2nd and 3rd row when we call hypotheses(mod).

hypotheses(mod, "b2 = b3")
tinytable_gziqu1umid48wc5qyw58
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
b2 = b3 3.16 0.72 4.39 <0.001 16.4 1.75 4.57
hypotheses(mod, hypothesis = "b* / b3 = 1")
tinytable_xgd9yb7t88s7fsd0j0rf
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
b1 / b3 = 1 -12.26735 2.07340 -5.9165 <0.001 28.2 -16.33 -8.204
b2 / b3 = 1 -0.99273 0.00413 -240.5537 <0.001 Inf -1.00 -0.985
b3 / b3 = 1 0.00000 NA NA NA NA NA NA
b4 / b3 = 1 0.05583 0.58287 0.0958 0.924 0.1 -1.09 1.198
b5 / b3 = 1 0.00141 0.82981 0.0017 0.999 0.0 -1.62 1.628

Term names with special characters must be enclosed in backticks:

hypotheses(mod, "factor(cyl)6 = factor(cyl)8")
tinytable_vhl51oh09yy2rzt056bw
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
factor(cyl)6 = factor(cyl)8 -0.173 1.65 -0.105 0.917 0.1 -3.41 3.07

## Predictions

Now consider the case of adjusted predictions:

mod <- lm(mpg ~ am + vs, data = mtcars)

p <- predictions(
mod,
newdata = datagrid(am = 0:1, vs = 0:1))
p
tinytable_2a055hx5st5rl6axlfze
am vs Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, vs, mpg
0 0 14.6 0.926 15.8 <0.001 183.4 12.8 16.4
0 1 21.5 1.130 19.0 <0.001 266.3 19.3 23.7
1 0 20.7 1.183 17.5 <0.001 224.5 18.3 23.0
1 1 27.6 1.130 24.4 <0.001 435.0 25.4 29.8

Since there is no term column in the output of the predictions function, we must use parameter identifiers like b1, b2, etc. to determine which estimates we want to compare:

hypotheses(p, hypothesis = "b1 = b2")
tinytable_wal91xlgrqm3quzxkxt3
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
b1=b2 -6.93 1.26 -5.49 <0.001 24.6 -9.4 -4.46

Or directly:

predictions(
mod,
hypothesis = "b1 = b2",
newdata = datagrid(am = 0:1, vs = 0:1))
tinytable_rr1x4qymtjmlch8xnckx
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
b1=b2 -6.93 1.26 -5.49 <0.001 24.6 -9.4 -4.46

p$estimate[1] - p$estimate[2]
#> [1] -6.929365

There are many more possibilities:

predictions(
mod,
hypothesis = "b1 + b2 = 30",
newdata = datagrid(am = 0:1, vs = 0:1))
tinytable_g4lqwku340x133cd7rma
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
b1+b2=30 6.12 1.64 3.74 <0.001 12.4 2.91 9.32

p$estimate[1] + p$estimate[2] - 30
#> [1] 6.118254

predictions(
mod,
hypothesis = "(b2 - b1) / (b3 - b2) = 0",
newdata = datagrid(am = 0:1, vs = 0:1))
tinytable_m66b380dswwj39p46wxb
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
(b2-b1)/(b3-b2)=0 -8.03 17 -0.473 0.636 0.7 -41.3 25.2

## Comparisons and slopes

The avg_comparisons() function allows us to answer questions of this form: On average, how does the expected outcome change when I change one regressor by some amount? Consider this:

mod <- lm(mpg ~ am + vs, data = mtcars)

cmp <- avg_comparisons(mod)
cmp
tinytable_hay75ra2xk68vcn01ojv
Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
am mean(1) - mean(0) 6.07 1.27 4.76 <0.001 19.0 3.57 8.57
vs mean(1) - mean(0) 6.93 1.26 5.49 <0.001 24.6 4.46 9.40

This tells us that, on average, moving from 0 to 1 on the am changes the predicted outcome by about 6.1, and changing .vs from 0 to 1 changes the predicted outcome by about 6.9.

Is the difference between those two estimates statistically significant? In other words, Is the effect of am equal to the effect of vs? To answer this question, we use the hypothesis argument:

avg_comparisons(mod, hypothesis = "am = vs")
tinytable_9tyy8pv42m2my6lvu8tu
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
am=vs -0.863 1.94 -0.445 0.656 0.6 -4.66 2.94

The hypothesis string can include any valid R expression, so we can run some silly non-linear tests:

avg_comparisons(mod, hypothesis = "exp(am) - 2 * vs = -400")
tinytable_r8inospwcgcvwhs2acdd
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
exp(am)-2*vs=-400 817 550 1.49 0.137 2.9 -261 1896

Note that the p values and confidence intervals are calculated using the delta method and are thus based on the assumption that the hypotheses expression is approximately normally distributed. For (very) non-linear functions of the parameters, this is not realistic, and we get p values with incorrect error rates and confidence intervals with incorrect coverage probabilities. For such hypotheses, it’s better to calculate the confidence intervals using the bootstrap (see inferences for details):

While the confidence interval from the delta method is symmetric, equal to the estimate ± 1.96 times the standard error, the (perhaps) more reliable confidence interval from the bootstrap is highly skewed.

set.seed(1234)

avg_comparisons(mod, hypothesis = "exp(am) - 2 * vs = -400") |>
inferences(method = "boot")
tinytable_m6q0mk87hdl4txr658zj
Term Estimate Std. Error 2.5 % 97.5 %
Type: response
Columns: term, estimate, std.error, conf.low, conf.high
exp(am)-2*vs=-400 817 1854 414 6990

The same approach can be taken to compare slopes:

mod <- lm(mpg ~ qsec * hp, data = mtcars)

avg_slopes(mod, hypothesis = "10 * hp - qsec = 0")
tinytable_ptfxvn6rd0ob0k4aajeu
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
10*hp-qsec=0 0.262 0.353 0.742 0.458 1.1 -0.43 0.954

# Formulas (Group-wise)

Since version 0.20.1.5 of marginaleffects, the hypothesis argument also accepts a formula to specify hypotheses to be tested on every row of the output, or within subsets. Consider this set of average predictions:

dat <- within(mtcars, {
gear = factor(gear)
cyl = factor(cyl)
})
dat <- sort_by(dat, ~ gear + cyl + am)
mod <- lm(mpg ~ am * wt * factor(cyl), data = dat)

avg_predictions(mod, by = "gear")
tinytable_yyrt7c52jtsfqowzarvu
gear Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: gear, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
3 16.1 0.609 26.5 <0.001 510.5 14.9 17.3
4 24.2 0.613 39.5 <0.001 Inf 23.0 25.4
5 22.2 0.837 26.5 <0.001 511.3 20.5 23.8

We can make “sequential”, “reference”, or “meandev” comparisons, either as differences (the default) or ratios:

avg_predictions(mod,
hypothesis = ~ sequential,
by = "gear")
tinytable_ihpu14o3f3riznjnkbxz
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: hypothesis, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
(gear[4]) - (gear[3]) 8.08 0.835 9.68 <0.001 71.2 6.45 9.721
(gear[5]) - (gear[4]) -2.01 0.829 -2.43 0.0152 6.0 -3.64 -0.387

avg_predictions(mod,
hypothesis = ~ meandev,
by = "gear")
tinytable_trivi3yo3btm1ks4ikdi
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: hypothesis, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
(gear[3]) - Avg -4.72 0.563 -8.38 <0.001 54.1 -5.822 -3.62
(gear[4]) - Avg 3.37 0.434 7.75 <0.001 46.6 2.514 4.22
(gear[5]) - Avg 1.35 0.560 2.42 0.0156 6.0 0.257 2.45

avg_predictions(mod,
hypothesis = ratio ~ sequential,
by = "gear")
tinytable_8wdypaafd6k3ow8aivqj
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: hypothesis, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
(gear[4]) / (gear[3]) 1.502 0.0662 22.7 <0.001 375.6 1.372 1.632
(gear[5]) / (gear[4]) 0.917 0.0336 27.3 <0.001 543.2 0.851 0.983

We can also test hypotheses by subgroup. For example, in this case, we want to compare every estimate to the reference category (first estimate) in each subset of am:

avg_predictions(mod,
by = c("am", "gear")
)
tinytable_8xa0uvo690x4x6utb0e1
am gear Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: am, gear, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
0 3 16.1 0.609 26.5 <0.001 510.5 14.9 17.3
0 4 21.0 1.091 19.3 <0.001 272.8 18.9 23.2
1 4 25.8 0.740 34.8 <0.001 880.0 24.3 27.2
1 5 22.2 0.837 26.5 <0.001 511.3 20.5 23.8

avg_predictions(mod,
hypothesis = ~ sequential | am,
by = c("am", "gear")
)
tinytable_rz2nmwljtk3uag9tt217
Hypothesis am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: hypothesis, am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
(gear[4]) - (gear[3]) 0 4.93 1.190 4.14 <0.001 14.8 2.59 7.26
(gear[5]) - (gear[4]) 1 -3.59 0.815 -4.41 <0.001 16.5 -5.19 -1.99

The grouping component of the formula is particularly useful when conducting tests on contrasts:

avg_comparisons(mod,
variables = "cyl",
by = "am")
tinytable_2og66tf1fizlljm2l6yp
Term Contrast am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, contrast, am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
cyl mean(6) - mean(4) 0 -11.33 6.04 -1.875 0.0608 4.0 -23.2 0.512
cyl mean(6) - mean(4) 1 -4.39 4.94 -0.889 0.3739 1.4 -14.1 5.291
cyl mean(8) - mean(4) 0 -9.59 3.84 -2.495 0.0126 6.3 -17.1 -2.057
cyl mean(8) - mean(4) 1 -7.85 8.46 -0.928 0.3535 1.5 -24.4 8.730

avg_comparisons(mod,
variables = "cyl",
by = "am",
hypothesis = ~ sequential | contrast)
tinytable_acgnnnrk451ilhmgmbwg
Contrast Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: hypothesis, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
mean(6) - mean(4) (am[1]) - (am[0]) 6.93 7.80 0.889 0.374 1.4 -8.36 22.2
mean(8) - mean(4) (am[1]) - (am[0]) 1.74 9.29 0.188 0.851 0.2 -16.46 20.0

avg_comparisons(mod,
variables = "cyl",
by = "am",
hypothesis = ~ sequential | am)
tinytable_mw8fc8y7cbhi3farz49p
Hypothesis am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: hypothesis, am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
(contrast[mean(8) - mean(4)]) - (contrast[mean(6) - mean(4)]) 0 1.74 4.79 0.363 0.717 0.5 -7.64 11.1
(contrast[mean(8) - mean(4)]) - (contrast[mean(6) - mean(4)]) 1 -3.45 9.65 -0.358 0.720 0.5 -22.37 15.5

It is also a powerful tool in categorical outcome models, to compare estimates for different outcome levels:

library("MASS")

mod <- polr(gear ~ cyl + hp, dat)

avg_comparisons(mod, variables = "cyl")
tinytable_yunck0sdyzhjb9soaq2f
Group Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: probs
Columns: term, group, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
3 cyl mean(6) - mean(4) 0.2679 0.0787 3.403 < 0.001 10.6 0.1136 0.4221
3 cyl mean(8) - mean(4) 0.8582 0.0432 19.886 < 0.001 289.9 0.7737 0.9428
4 cyl mean(6) - mean(4) 0.0108 0.0474 0.228 0.81962 0.3 -0.0822 0.1038
4 cyl mean(8) - mean(4) -0.1971 0.0651 -3.028 0.00246 8.7 -0.3246 -0.0695
5 cyl mean(6) - mean(4) -0.2787 0.0717 -3.887 < 0.001 13.3 -0.4192 -0.1382
5 cyl mean(8) - mean(4) -0.6612 0.0694 -9.533 < 0.001 69.2 -0.7971 -0.5253

Is the “8 vs. 4” contrast equal to the “6 vs. 4” contrast, within each outcome level?

avg_comparisons(mod,
variables = "cyl",
hypothesis = ~ sequential | group)
tinytable_sh637pqzfl176u4fqbwf
Group Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: probs
Columns: group, hypothesis, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
3 (contrast[mean(8) - mean(4)]) - (contrast[mean(6) - mean(4)]) 0.590 0.0815 7.24 < 0.001 41.0 0.431 0.7502
4 (contrast[mean(8) - mean(4)]) - (contrast[mean(6) - mean(4)]) -0.208 0.0717 -2.90 0.00376 8.1 -0.348 -0.0673
5 (contrast[mean(8) - mean(4)]) - (contrast[mean(6) - mean(4)]) -0.383 0.0575 -6.65 < 0.001 35.0 -0.495 -0.2698

Is the “8 vs. 4” contrast for outcome level 3 equal to the “8 vs. 4” contrast for outcome level 4?

avg_comparisons(mod,
variables = "cyl",
hypothesis = ~ sequential | contrast)
tinytable_fp6wvl1p1cppoigulehy
Contrast Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: probs
Columns: hypothesis, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
mean(6) - mean(4) (group[4]) - (group[3]) -0.257 0.1084 -2.37 0.01775 5.8 -0.470 -0.0445
mean(6) - mean(4) (group[5]) - (group[4]) -0.289 0.0927 -3.12 0.00178 9.1 -0.471 -0.1079
mean(8) - mean(4) (group[4]) - (group[3]) -1.055 0.0859 -12.28 < 0.001 112.7 -1.224 -0.8869
mean(8) - mean(4) (group[5]) - (group[4]) -0.464 0.1274 -3.64 < 0.001 11.9 -0.714 -0.2145

# Functions

Hypothesis tests can also be conducted using arbitrary R functions. This allows users to test hypotheses on complex aggregations or transformations. To achieve this, we define a custom function which accepts a fitted model or marginaleffects objects, and returns a data frame with at least two columns: term (or hypothesis) and estimate.

## Predictions

When supplying a function to the hypothesis argument, that function must accept an argument x which is a data frame with columns rowid and estimate (and optional columns for other elements of newdata). That function must return a data frame with columns term (or hypothesis) and estimate.

In this example, we test if the mean predicted value is different from 2:

dat <- transform(mtcars, gear = factor(gear), cyl = factor(cyl))

mod <- lm(wt ~ mpg * hp * cyl, data = dat)

hyp <- function(x) {
data.frame(
hypothesis = "Avg(Ŷ) = 2",
estimate = mean(x$estimate) - 2 ) } predictions(mod, hypothesis = hyp) tinytable_g30p6wihj4fhs8tuouvt Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: response Columns: hypothesis, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high Avg(Ŷ) = 2 1.22 0.0914 13.3 <0.001 132.0 1.04 1.4 In this ordinal logit model, the predictions() function returns one row per observation and per level of the outcome variable: library(MASS) library(dplyr) mod <- polr(gear ~ cyl + hp, dat) avg_predictions(mod) tinytable_lfestum0eram5u0zpr7v Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: probs Columns: group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 3 0.471 0.0584 8.05 <0.001 50.2 0.3561 0.585 4 0.366 0.0715 5.12 <0.001 21.7 0.2263 0.507 5 0.163 0.0478 3.41 <0.001 10.6 0.0692 0.257 We can use a function in the hypothesis argument to collapse the rows, displaying the average predicted values in groups 3-4 vs. 5: fun <- function(x) { out <- x |> mutate(term = ifelse(group %in% 3:4, "3 & 4", "5")) |> summarize(estimate = mean(estimate), .by = term) return(out) } avg_predictions(mod, hypothesis = fun) tinytable_6yh5wzfobst3c0qifd0t Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: probs Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 3 & 4 0.419 0.0239 17.51 <0.001 225.5 0.3717 0.465 5 0.163 0.0478 3.41 <0.001 10.6 0.0692 0.257 And we can compare the two categories by doing: fun <- function(x) { out <- x |> mutate(term = ifelse(group %in% 3:4, "3 & 4", "5")) |> summarize(estimate = mean(estimate), .by = term) |> summarize(estimate = diff(estimate), term = "5 - (3 & 4)") return(out) } avg_predictions(mod, hypothesis = fun) tinytable_xtknqzut4p0p174dik96 Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: probs Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 5 - (3 & 4) -0.256 0.0717 -3.56 <0.001 11.4 -0.396 -0.115 ## Comparisons and slopes In the same ordinal logit model, we can estimate the average effect of an increase of 1 unit in hp on the expected probability of every level of the outcome: avg_comparisons(mod, variables = "hp") tinytable_z96z6wl1dm1b0gntrm7p Group Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: probs Columns: term, group, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 3 hp mean(+1) -0.00774 0.002294 -3.38 <0.001 10.4 -0.01224 -0.00325 4 hp mean(+1) 0.00258 0.002201 1.17 0.24 2.1 -0.00173 0.00690 5 hp mean(+1) 0.00516 0.000882 5.85 <0.001 27.6 0.00343 0.00689 Compare estimate for different outcome levels: fun <- function(x) { x |> mutate(estimate = (estimate - lag(estimate)), group = sprintf("%s - %s", group, lag(group))) |> filter(!is.na(estimate)) } avg_comparisons(mod, variables = "hp", hypothesis = fun) tinytable_215h5cdvym22le938d34 Group Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: probs Columns: term, group, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 4 - 3 hp mean(+1) 0.01033 0.00441 2.34 0.0191 5.7 0.00169 0.01897 5 - 4 hp mean(+1) 0.00258 0.00245 1.05 0.2922 1.8 -0.00222 0.00737 Now suppose we want to compare the effect of hp for different levels of the outcome, but this time we do the computation within levels of cyl: avg_comparisons(mod, variables = "hp", by = "cyl") tinytable_i6bmy3np17tzke7x7bqe Group Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: probs Columns: term, group, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 3 hp mean(+1) 4 -0.007862 0.003445 -2.282 0.02248 5.5 -1.46e-02 -0.00111 3 hp mean(+1) 6 -0.013282 0.005038 -2.637 0.00838 6.9 -2.32e-02 -0.00341 3 hp mean(+1) 8 -0.004882 0.001457 -3.350 < 0.001 10.3 -7.74e-03 -0.00203 4 hp mean(+1) 4 -0.003024 0.003714 -0.814 0.41560 1.3 -1.03e-02 0.00426 4 hp mean(+1) 6 0.008573 0.006431 1.333 0.18251 2.5 -4.03e-03 0.02118 4 hp mean(+1) 8 0.003995 0.001627 2.455 0.01409 6.1 8.06e-04 0.00719 5 hp mean(+1) 4 0.010886 0.002652 4.104 < 0.001 14.6 5.69e-03 0.01608 5 hp mean(+1) 6 0.004709 0.001929 2.442 0.01462 6.1 9.29e-04 0.00849 5 hp mean(+1) 8 0.000887 0.000431 2.059 0.03947 4.7 4.27e-05 0.00173 fun <- function(x) { x |> mutate(estimate = (estimate - lag(estimate)), group = sprintf("%s - %s", group, lag(group)), .by = "cyl") |> filter(!is.na(estimate)) } avg_comparisons(mod, variables = "hp", by = "cyl", hypothesis = fun) tinytable_om1f5xk9uupqx66zfn79 Group Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: probs Columns: term, group, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 4 - 3 hp mean(+1) 4 0.00484 0.00666 0.727 0.46719 1.1 -0.008205 0.017882 4 - 3 hp mean(+1) 6 0.02185 0.01139 1.919 0.05503 4.2 -0.000471 0.044180 4 - 3 hp mean(+1) 8 0.00888 0.00306 2.902 0.00371 8.1 0.002882 0.014874 5 - 4 hp mean(+1) 4 0.01391 0.00546 2.548 0.01082 6.5 0.003211 0.024607 5 - 4 hp mean(+1) 6 -0.00386 0.00805 -0.480 0.63117 0.7 -0.019638 0.011911 5 - 4 hp mean(+1) 8 -0.00311 0.00188 -1.651 0.09869 3.3 -0.006798 0.000581 ## Fitted models The hypothesis argument can be used to compute standard errors for arbitrary functions of model parameters. This user-supplied function must accept a single model object, and return a data.frame with two columns named term and estimate. Here, we test if the sum of the hp and mpg coefficients is equal to 2: mod <- glm(am ~ hp + mpg, data = mtcars, family = binomial) fun <- function(x) { b <- coef(x) out <- data.frame( term = "hp + mpg = 2", estimate = b["hp"] + b["mpg"] - 2, row.names = NULL ) return(out) } hypotheses(mod, hypothesis = fun) tinytable_ncwfvz73v6by7tncl85t Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high hp + mpg = 2 -0.685 0.593 -1.16 0.248 2.0 -1.85 0.476 Test of equality between two two predictions: fun <- function(x) { p <- predict(x, newdata = mtcars) out <- data.frame(term = "pred[2] = pred[3]", estimate = p[2] - p[3]) return(out) } hypotheses(mod, hypothesis = fun) tinytable_4ar803eaf3pvn8mgz4b2 Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high pred[2] = pred[3] -1.33 0.616 -2.16 0.0305 5.0 -2.54 -0.125 We can also use more complex aggregation patterns. In this ordinal logistic regression model, we model the number of gears for each ar. If we compute fitted values with the predictions() function, we obtain one predicted probability for each individual car and for each level of the response variable: library(MASS) library(dplyr) dat <- transform(mtcars, gear = factor(gear), cyl = factor(cyl)) mod <- polr(gear ~ cyl + hp, dat) predictions(mod) tinytable_bi4crwcoebdcflv2uny9 Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: probs Columns: rowid, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, gear, cyl, hp 3 0.3931 0.19125 2.06 0.03982 4.7 0.0183 0.768 3 0.3931 0.19125 2.06 0.03982 4.7 0.0183 0.768 3 0.0440 0.04256 1.03 0.30081 1.7 -0.0394 0.127 3 0.3931 0.19125 2.06 0.03982 4.7 0.0183 0.768 3 0.9963 0.00721 138.17 < 0.001 Inf 0.9822 1.010 5 0.6969 0.18931 3.68 < 0.001 12.1 0.3258 1.068 5 0.0555 0.06851 0.81 0.41775 1.3 -0.0788 0.190 5 0.8115 0.20626 3.93 < 0.001 13.5 0.4073 1.216 5 0.9111 0.16818 5.42 < 0.001 24.0 0.5815 1.241 5 0.6322 0.19648 3.22 0.00129 9.6 0.2471 1.017 There are three levels to the outcome: 3, 4, and 5. Imagine that, for each car in the dataset, we want to collapse categories of the output variable into two categories (“3 & 4” and “5”) by taking sums of predicted probabilities. Then, we want to take the average of those predicted probabilities for each level of cyl. To do so, we define a custom function, and pass it to the hypothesis argument of the hypotheses() function: fun <- function(x) { predictions(x, vcov = FALSE) |> # label the new categories of outcome levels mutate(group = ifelse(group %in% c("3", "4"), "3 & 4", "5")) |> # sum of probabilities at the individual level summarize(estimate = sum(estimate), .by = c("rowid", "cyl", "group")) |> # average probabilities for each value of cyl summarize(estimate = mean(estimate), .by = c("cyl", "group")) |> # the FUN argument requires a term column rename(term = cyl) } hypotheses(mod, hypothesis = fun) tinytable_lwak4p74ff77scjuzca7 Group Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 3 & 4 6 0.8390 0.0651 12.89 <0.001 123.9 0.7115 0.967 3 & 4 4 0.7197 0.1099 6.55 <0.001 34.0 0.5044 0.935 3 & 4 8 0.9283 0.0174 53.45 <0.001 Inf 0.8943 0.962 5 6 0.1610 0.0651 2.47 0.0134 6.2 0.0334 0.289 5 4 0.2803 0.1099 2.55 0.0108 6.5 0.0649 0.496 5 8 0.0717 0.0174 4.13 <0.001 14.7 0.0377 0.106 Note that this workflow will not work for bayesian models or with bootstrap. However, with those models it is trivial to do the same kind of aggregation by calling posterior_draws() and operating directly on draws from the posterior distribution. See the vignette on bayesian analysis for examples with the posterior_draws() function. # Vectors and Matrices The predictions() function can estimate marginal means. The hypothesis argument of that function offers a powerful mechanism to estimate custom contrasts between marginal means, by way of linear combination. ## Simple contrast Consider a simple example: library(marginaleffects) library(emmeans) library(nnet) dat <- mtcars dat$carb <- factor(dat$carb) dat$cyl <- factor(dat$cyl) dat$am <- as.logical(dat$am) dat <- sort_by(dat, ~carb) mod <- lm(mpg ~ carb + cyl, dat) mm <- predictions(mod, by = "carb", newdata = "balanced") mm tinytable_psp8w9smoo5fbt6yo5qc carb Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: response Columns: carb, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 1 21.7 1.44 15.06 <0.001 167.8 18.8 24.5 2 21.3 1.23 17.29 <0.001 220.0 18.9 23.8 3 21.4 2.19 9.77 <0.001 72.5 17.1 25.7 4 18.9 1.21 15.59 <0.001 179.7 16.5 21.3 6 19.8 3.55 5.56 <0.001 25.2 12.8 26.7 8 20.1 3.51 5.73 <0.001 26.6 13.2 27.0 The contrast between marginal means for carb==1 and carb==2 is: 21.66232 - 21.34058 #> [1] 0.32174 or 21.66232 + -(21.34058) #> [1] 0.32174 or sum(c(21.66232, 21.34058) * c(1, -1)) #> [1] 0.32174 or c(21.66232, 21.34058) %*% c(1, -1) #> [,1] #> [1,] 0.32174 The last two commands express the contrast of interest as a linear combination of marginal means. In the predictions() function, we can supply a hypothesis argument to compute linear combinations of marginal means. This argument must be a numeric vector of the same length as the number of rows in the output. For example, in the previous there were six rows, and the two marginal means we want to compare are at in the first two positions: lc <- c(1, -1, 0, 0, 0, 0) predictions(mod, by = "carb", newdata = "balanced", hypothesis = lc) tinytable_6335oueu8is0k91tfcfy Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: response Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high custom 0.322 1.77 0.181 0.856 0.2 -3.15 3.8 ## Complex contrast Of course, we can also estimate more complex contrasts: lc <- c(0, -2, 1, 1, -1, 1) predictions(mod, by = "carb", newdata = "balanced", hypothesis = lc) tinytable_eynj0eft1c4wh7r95tn9 Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: response Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high custom -2.02 6.32 -0.32 0.749 0.4 -14.4 10.4 emmeans produces similar results: library(emmeans) em <- emmeans(mod, "carb") lc <- data.frame(custom_contrast = c(0, -2, 1, 1, -1, 1)) contrast(em, method = lc) #> contrast estimate SE df t.ratio p.value #> custom_contrast -2.02 6.32 24 -0.320 0.7516 #> #> Results are averaged over the levels of: cyl ## Multiple contrasts Users can also compute multiple linear combinations simultaneously by supplying a numeric matrix to hypotheses. This matrix must have the same number of rows as the output of slopes(), and each column represents a distinct set of weights for different linear combinations. The column names of the matrix become labels in the output. For example: lc <- matrix(c( -2, 1, 1, 0, -1, 1, 1, -1, 0, 0, 0, 0 ), ncol = 2) colnames(lc) <- c("Contrast A", "Contrast B") lc #> Contrast A Contrast B #> [1,] -2 1 #> [2,] 1 -1 #> [3,] 1 0 #> [4,] 0 0 #> [5,] -1 0 #> [6,] 1 0 predictions(mod, by = "carb", newdata = "balanced", hypothesis = lc) tinytable_mv06xgqppdm6d828q8oj Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Type: response Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high Contrast A -0.211 6.93 -0.0304 0.976 0.0 -13.79 13.4 Contrast B 0.322 1.77 0.1814 0.856 0.2 -3.15 3.8 # Arbitrary quantities marginaleffects can also compute uncertainty estimates for arbitrary quantities hosted in a data frame, as long as the user can supply a variance-covariance matrix. (Thanks to Kyle F Butts for this cool feature and example!) Say you run a monte-carlo simulation and you want to perform hypothesis of various quantities returned from each simulation. The quantities are correlated within each draw: # simulated means and medians draw <- function(i) { x <- rnorm(n = 10000, mean = 0, sd = 1) out <- data.frame(median = median(x), mean = mean(x)) return(out) } sims <- do.call("rbind", lapply(1:25, draw)) # average mean and average median coeftable <- data.frame( term = c("median", "mean"), estimate = c(mean(sims$median), mean(sims$mean)) ) # variance-covariance vcov <- cov(sims) # is the median equal to the mean? hypotheses( coeftable, vcov = vcov, hypothesis = "median = mean" ) tinytable_bkllmxl9jp6tb7x2di52 Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high median = mean -0.00083 0.00802 -0.103 0.918 0.1 -0.0166 0.0149 # Joint hypotheses tests The hypotheses() function can also test multiple hypotheses jointly. For example, consider this model: model <- lm(mpg ~ as.factor(cyl) * hp, data = mtcars) coef(model) #> (Intercept) as.factor(cyl)6 as.factor(cyl)8 hp as.factor(cyl)6:hp as.factor(cyl)8:hp #> 35.98302564 -15.30917451 -17.90295193 -0.11277589 0.10516262 0.09853177 We may want to test the null hypothesis that two of the coefficients are jointly (both) equal to zero. hypotheses(model, joint = c("as.factor(cyl)6:hp", "as.factor(cyl)8:hp")) tinytable_b324meyh3tzrlwu7y1mk F Pr(>|F|) Df 1 Df 2 Columns: statistic, p.value, df1, df2 2.11 0.142 2 26 The joint argument allows users to flexibly specify the parameters to be tested, using character vectors, integer indices, or Perl-compatible regular expressions. We can also specify the null hypothesis for each parameter individually using the hypothesis argument. Naturally, the hypotheses function also works with marginaleffects objects. # ## joint hypotheses: regular expression hypotheses(model, joint = "cyl") tinytable_bpotuyo37xe5gpvszpjy F Pr(>|F|) Df 1 Df 2 Columns: statistic, p.value, df1, df2 5.7 0.00197 4 26  # joint hypotheses: integer indices hypotheses(model, joint = 2:3) tinytable_d2asya3931xrd8sc66t3 F Pr(>|F|) Df 1 Df 2 Columns: statistic, p.value, df1, df2 6.12 0.00665 2 26  # joint hypotheses: different null hypotheses hypotheses(model, joint = 2:3, hypothesis = 1) tinytable_6vej5zmj78yh3revd2lx F Pr(>|F|) Df 1 Df 2 Columns: statistic, p.value, df1, df2 6.84 0.00411 2 26 hypotheses(model, joint = 2:3, hypothesis = 1:2) tinytable_kkzsuyxncp8fuv7oftl9 F Pr(>|F|) Df 1 Df 2 Columns: statistic, p.value, df1, df2 7.47 0.00273 2 26  # joint hypotheses: marginaleffects object cmp <- avg_comparisons(model) hypotheses(cmp, joint = "cyl") tinytable_qbt4e86b1dt5t06wkjgf F Pr(>|F|) Df 1 Df 2 Columns: statistic, p.value, df1, df2 1.6 0.221 2 26 We can also combine multiple calls to hypotheses to execute a joint test on linear combinations of coefficients: # fit model mod <- lm(mpg ~ factor(carb), mtcars) # hypothesis matrix for linear combinations H <- matrix(0, nrow = length(coef(mod)), ncol = 2) H[2:3, 1] <- H[4:6, 2] <- 1 # test individual linear combinations hyp <- hypotheses(mod, hypothesis = H) hyp tinytable_bt2eliuhp8vrx01si8ga Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high custom -12.0 4.92 -2.44 0.01477 6.1 -21.6 -2.35 custom -25.5 9.03 -2.83 0.00466 7.7 -43.2 -7.85  # test joint hypotheses #hypotheses(hyp, joint = TRUE, hypothesis = c(-10, -20)) # More ## Difference-in-Differences Now we illustrate how to use the machinery described above to do pairwise comparisons between contrasts, a type of analysis often associated with a “Difference-in-Differences” research design. First, we simulate data with two treatment groups and pre/post periods: library(data.table) N <- 1000 did <- data.table( id = 1:N, pre = rnorm(N), trt = sample(0:1, N, replace = TRUE)) did$post <- did$pre + did$trt * 0.3 + rnorm(N)
did <- melt(
did,
value.name = "y",
variable.name = "time",
id.vars = c("id", "trt"))
#>       id   trt   time          y
#>    <int> <int> <fctr>      <num>
#> 1:     1     1    pre  1.4837871
#> 2:     2     0    pre -1.3323563
#> 3:     3     1    pre  0.8758262
#> 4:     4     1    pre  0.8968294
#> 5:     5     0    pre -1.5205779
#> 6:     6     1    pre  1.8356310

Then, we estimate a linear model with a multiple interaction between the time and the treatment indicators. We also compute contrasts at the mean for each treatment level:

did_model <- lm(y ~ time * trt, data = did)

comparisons(
did_model,
newdata = datagrid(trt = 0:1),
variables = "time")
tinytable_99yrsyp7tcxv7txh7gs2
Term Contrast trt Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % time
Type: response
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, trt, predicted_lo, predicted_hi, predicted, time, y
time post - pre 0 -0.000421 0.0795 -0.00529 0.996 0.0 -0.156 0.155 pre
time post - pre 1 0.287604 0.0790 3.63937 <0.001 11.8 0.133 0.442 pre

Finally, we compute pairwise differences between contrasts. This is the Diff-in-Diff estimate:

comparisons(
did_model,
variables = "time",
newdata = datagrid(trt = 0:1),
hypothesis = "pairwise")
tinytable_lmk8yx47sb6b6d3z2kbo
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Type: response
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Row 1 - Row 2 -0.288 0.112 -2.57 0.0102 6.6 -0.508 -0.0683