22 Hypothesis Tests (Bonus)
The marginaleffects
package can conduct linear or non-linear hypothesis tests on the coefficients of any supported model class, or on the quantities generated by any of the other functions of the package: predictions()
, comparisons()
, or slopes()
.
There are two main entry points for hypothesis tests:
-
marginaleffects
functions: Use thehypothesis
argument. - Other
R
objects and models: Use thehypotheses()
function.
Both the hypothesis
argument and the hypotheses()
function accept several input types, which allow a lot of flexibility in the specification of hypothesis tests:
- Numeric: Value of a null hypothesis.
- Formula: Equation of a (non-)linear hypothesis test.
- Function: Tests on arbitrary transformations or aggregations.
- Matrix: Contrast matrices and vectors.
This vignette shows how to use each of these strategies to conduct hypothesis tests on model coefficients or on the quantities estimated by the marginaleffects
package. After reading it, you will be able to specify custom hypothesis tests and contrasts to assess statements like:
- The coefficients \(\beta_1\) and \(\beta_2\) are equal.
- The slope of \(Y\) with respect to \(X_1\) is equal to the slope with respect to \(X_2\).
- The average treatment effect of \(X\) when \(W=0\) is equal to the average treatment effect of \(X\) when \(W=1\).
- A non-linear function of predicted values is equal to 100.
- The marginal mean in the control group is equal to the average of marginal means in the other 3 treatment arms.
- Cross-level contrasts: In a multinomial model, the effect of \(X\) on the 1st outcome level is equal to the effect of \(X\) on the 2nd outcome level.
23 Numeric (null hypothesis)
The simplest way to modify a hypothesis test is to change the null hypothesis. By default, all functions in the marginaleffects
package assume that the null is 0. This can be changed with the hypothesis
argument.
23.1 Coefficients
Consider a simple logistic regression model:
library(marginaleffects)
mod <- glm(am ~ hp + drat, data = mtcars, family = binomial)
By default, the summary()
function will report the results of hypothesis tests where the null is set to 0:
summary(mod)
Call:
glm(formula = am ~ hp + drat, family = binomial, data = mtcars)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -29.076080 12.416916 -2.342 0.0192 *
hp 0.010793 0.009328 1.157 0.2473
drat 7.309781 3.046597 2.399 0.0164 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 20.144 on 29 degrees of freedom
AIC: 26.144
Number of Fisher Scoring iterations: 7
Using hypotheses()
, we can easily change the null hypothesis:
hypotheses(mod, hypothesis = 6)
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
(Intercept) -29.0761 12.41692 -2.82 0.00473 7.7 -53.41279 -4.7394
hp 0.0108 0.00933 -642.04 < 0.001 Inf -0.00749 0.0291
drat 7.3098 3.04660 0.43 0.66726 0.6 1.33856 13.2810
23.2 Predictions
Changing the value of the null is particularly important in the context of predictions, where the 0 baseline may not be particularly meaningful. For example, here we compute the predicted outcome for a hypothetical unit where all regressors are fixed to their sample means:
predictions(mod, newdata = "mean")
Estimate Pr(>|z|) S 2.5 % 97.5 %
0.231 0.135 2.9 0.0584 0.592
Type: invlink(link)
The Z statistic and p value reported above assume that the null hypothesis equals zero. We can change the null with the hypothesis
argument:
predictions(mod, newdata = "mean", hypothesis = .5)
Estimate Pr(>|z|) S 2.5 % 97.5 %
0.231 0.0343 4.9 0.0584 0.592
Type: invlink(link)
23.3 Comparisons and slopes
When computing different quantities of interest like risk ratios, it can make sense to set the null hypothesis to 1 rather than 0:
avg_comparisons(
mod,
variables = "hp",
comparison = "ratio",
hypothesis = 1) |>
print(digits = 5)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
1.0027 0.0021743 1.2515 0.21074 2.2 0.99846 1.007
Term: hp
Type: response
Comparison: mean(+1)
24 Equations
The hypotheses()
function emulates the equation-based syntax of the well-established car::deltaMethod
and car::linearHypothesis
functions. However, marginaleffects
supports more models, requires fewer dependencies, and offers some convenience features like the vcov
argument for robust standard errors.
The syntax we illustrate in this section takes the form a string equation, ex: "am = 2 * vs"
24.1 Coefficients
Let’s start by estimating a simple model:
library(marginaleffects)
mod <- lm(mpg ~ hp + wt + factor(cyl), data = mtcars)
When the FUN
and hypothesis
arguments of hypotheses()
equal NULL
(the default), the function returns a data.frame of raw estimates:
hypotheses(mod)
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
(Intercept) 35.8460 2.041 17.56 <0.001 227.0 31.8457 39.846319
hp -0.0231 0.012 -1.93 0.0531 4.2 -0.0465 0.000306
wt -3.1814 0.720 -4.42 <0.001 16.6 -4.5918 -1.771012
factor(cyl)6 -3.3590 1.402 -2.40 0.0166 5.9 -6.1062 -0.611803
factor(cyl)8 -3.1859 2.170 -1.47 0.1422 2.8 -7.4399 1.068169
Test of equality between coefficients:
hypotheses(mod, "hp = wt")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
hp=wt 3.16 0.72 4.39 <0.001 16.4 1.75 4.57
Non-linear function of coefficients
hypotheses(mod, "exp(hp + wt) = 0.1")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
exp(hp+wt)=0.1 -0.0594 0.0292 -2.04 0.0418 4.6 -0.117 -0.0022
The vcov
argument behaves in the same was as in all the other marginaleffects
functions, allowing us to easily compute robust standard errors:
hypotheses(mod, "hp = wt", vcov = "HC3")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
hp=wt 3.16 0.805 3.92 <0.001 13.5 1.58 4.74
We can use shortcuts like b1
, b2
, ...
to identify the position of each parameter in the output of FUN
. For example, b2=b3
is equivalent to hp=wt
because those term names appear in the 2nd and 3rd row when we call hypotheses(mod)
.
hypotheses(mod, "b2 = b3")
Warning:
It is essential to check the order of estimates when specifying hypothesis tests using positional indices like b1, b2, etc. The indices of estimates can change depending on the order of rows in the original dataset, user-supplied arguments, model-fitting package, and version of `marginaleffects`.
It is also good practice to use assertions that ensure the order of estimates is consistent across different runs of the same code. Example:
```r
mod <- lm(mpg ~ am * carb, data = mtcars)
# assertion for safety
p <- avg_predictions(mod, by = 'carb')
stopifnot(p$carb[1] != 1 || p$carb[2] != 2)
# hypothesis test
avg_predictions(mod, by = 'carb', hypothesis = 'b1 - b2 = 0')
```
Disable this warning with: `options(marginaleffects_safe = FALSE)`
This warning appears once per session.
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
b2=b3 3.16 0.72 4.39 <0.001 16.4 1.75 4.57
hypotheses(mod, hypothesis = "b* / b3 = 1")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
b1/b3=1 -12.26735 2.07340 -5.9165 <0.001 28.2 -16.33 -8.204
b2/b3=1 -0.99273 0.00413 -240.5539 <0.001 Inf -1.00 -0.985
b3/b3=1 0.00000 NA NA NA NA NA NA
b4/b3=1 0.05583 0.58287 0.0958 0.924 0.1 -1.09 1.198
b5/b3=1 0.00141 0.82981 0.0017 0.999 0.0 -1.62 1.628
Term names with special characters must be enclosed in backticks:
hypotheses(mod, "`factor(cyl)6` = `factor(cyl)8`")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 %
`factor(cyl)6`=`factor(cyl)8` -0.173 1.65 -0.105 0.917 0.1 -3.41
97.5 %
3.07
24.2 Predictions
Now consider the case of adjusted predictions:
mod <- lm(mpg ~ am + vs, data = mtcars)
p <- predictions(
mod,
newdata = datagrid(am = 0:1, vs = 0:1))
p
am vs Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0 0 14.6 0.926 15.8 <0.001 183.4 12.8 16.4
0 1 21.5 1.130 19.0 <0.001 266.3 19.3 23.7
1 0 20.7 1.183 17.5 <0.001 224.5 18.3 23.0
1 1 27.6 1.130 24.4 <0.001 435.0 25.4 29.8
Type: response
Since there is no term
column in the output of the predictions
function, we must use parameter identifiers like b1
, b2
, etc. to determine which estimates we want to compare:
hypotheses(p, hypothesis = "b1 = b2")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
b1=b2 -6.93 1.26 -5.49 <0.001 24.6 -9.4 -4.46
Or directly:
predictions(
mod,
hypothesis = "b1 = b2",
newdata = datagrid(am = 0:1, vs = 0:1))
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
b1=b2 -6.93 1.26 -5.49 <0.001 24.6 -9.4 -4.46
Type: response
p$estimate[1] - p$estimate[2]
[1] -6.929365
There are many more possibilities:
predictions(
mod,
hypothesis = "b1 + b2 = 30",
newdata = datagrid(am = 0:1, vs = 0:1))
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
b1+b2=30 6.12 1.64 3.74 <0.001 12.4 2.91 9.32
Type: response
p$estimate[1] + p$estimate[2] - 30
[1] 6.118254
predictions(
mod,
hypothesis = "(b2 - b1) / (b3 - b2) = 0",
newdata = datagrid(am = 0:1, vs = 0:1))
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
(b2-b1)/(b3-b2)=0 -8.03 17 -0.473 0.636 0.7 -41.3 25.2
Type: response
24.3 Comparisons and slopes
The avg_comparisons()
function allows us to answer questions of this form: On average, how does the expected outcome change when I change one regressor by some amount? Consider this:
mod <- lm(mpg ~ am + vs, data = mtcars)
cmp <- avg_comparisons(mod)
cmp
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
am 6.07 1.27 4.76 <0.001 19.0 3.57 8.57
vs 6.93 1.26 5.49 <0.001 24.6 4.46 9.40
Type: response
Comparison: 1 - 0
This tells us that, on average, moving from 0 to 1 on the am
changes the predicted outcome by about 6.1, and changing .vs
from 0 to 1 changes the predicted outcome by about 6.9.
Is the difference between those two estimates statistically significant? In other words, Is the effect of am
equal to the effect of vs
? To answer this question, we use the hypothesis
argument:
avg_comparisons(mod, hypothesis = "am = vs")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
am=vs -0.863 1.94 -0.445 0.656 0.6 -4.66 2.94
Type: response
The hypothesis
string can include any valid R
expression, so we can run some silly non-linear tests:
avg_comparisons(mod, hypothesis = "exp(am) - 2 * vs = -400")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
exp(am)-2*vs=-400 817 550 1.49 0.137 2.9 -261 1896
Type: response
Note that the p values and confidence intervals are calculated using the delta method and are thus based on the assumption that the hypotheses
expression is approximately normally distributed. For (very) non-linear functions of the parameters, this is not realistic, and we get p values with incorrect error rates and confidence intervals with incorrect coverage probabilities. For such hypotheses, it’s better to calculate the confidence intervals using the bootstrap (see inferences
for details):
While the confidence interval from the delta method is symmetric, equal to the estimate ± 1.96 times the standard error, the (perhaps) more reliable confidence interval from the bootstrap is highly skewed.
set.seed(1234)
avg_comparisons(mod, hypothesis = "exp(am) - 2 * vs = -400") |>
inferences(method = "boot")
Hypothesis Estimate Std. Error 2.5 % 97.5 %
exp(am)-2*vs=-400 817 1854 414 6990
Type: response
The same approach can be taken to compare slopes:
mod <- lm(mpg ~ qsec * hp, data = mtcars)
avg_slopes(mod, hypothesis = "10 * hp - qsec = 0")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
10*hp-qsec=0 0.262 0.353 0.742 0.458 1.1 -0.43 0.954
Type: response
25 Formulas (Group-wise)
Since version 0.20.1.5 of marginaleffects
, the hypothesis
argument also accepts a formula to specify hypotheses to be tested on every row of the output, or within subsets. Consider this set of average predictions:
dat <- within(mtcars, {
gear = factor(gear)
cyl = factor(cyl)
})
dat <- sort_by(dat, ~ gear + cyl + am)
mod <- lm(mpg ~ am * wt * factor(cyl), data = dat)
avg_predictions(mod, by = "gear")
gear Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3 16.1 0.609 26.5 <0.001 510.5 14.9 17.3
4 24.2 0.613 39.5 <0.001 Inf 23.0 25.4
5 22.2 0.837 26.5 <0.001 511.3 20.5 23.8
Type: response
We can make “sequential”, “reference”, or “meandev” comparisons, either as differences (the default) or ratios:
avg_predictions(mod,
hypothesis = ~ sequential,
by = "gear")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
(4) - (3) 8.08 0.835 9.68 <0.001 71.2 6.45 9.721
(5) - (4) -2.01 0.829 -2.43 0.0152 6.0 -3.64 -0.387
Type: response
avg_predictions(mod,
hypothesis = ~ meandev,
by = "gear")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
(3) - Mean -4.72 0.563 -8.38 <0.001 54.1 -5.822 -3.62
(4) - Mean 3.37 0.434 7.75 <0.001 46.6 2.514 4.22
(5) - Mean 1.35 0.560 2.42 0.0156 6.0 0.257 2.45
Type: response
avg_predictions(mod,
hypothesis = ratio ~ sequential,
by = "gear")
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
(4) / (3) 1.502 0.0662 22.7 <0.001 375.6 1.372 1.632
(5) / (4) 0.917 0.0336 27.3 <0.001 543.2 0.851 0.983
Type: response
We can also test hypotheses by subgroup. For example, in this case, we want to compare every estimate to the reference category (first estimate) in each subset of am
:
avg_predictions(mod,
by = c("am", "gear")
)
am gear Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0 3 16.1 0.609 26.5 <0.001 510.5 14.9 17.3
0 4 21.0 1.091 19.3 <0.001 272.8 18.9 23.2
1 4 25.8 0.740 34.8 <0.001 880.0 24.3 27.2
1 5 22.2 0.837 26.5 <0.001 511.3 20.5 23.8
Type: response
avg_predictions(mod,
hypothesis = ~ sequential | am,
by = c("am", "gear")
)
am Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0 (4) - (3) 4.93 1.190 4.14 <0.001 14.8 2.59 7.26
1 (5) - (4) -3.59 0.815 -4.41 <0.001 16.5 -5.19 -1.99
Type: response
The grouping component of the formula is particularly useful when conducting tests on contrasts:
avg_comparisons(mod,
variables = "cyl",
by = "am")
Contrast am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
6 - 4 0 -11.33 6.04 -1.875 0.0608 4.0 -23.2 0.512
8 - 4 0 -9.59 3.84 -2.495 0.0126 6.3 -17.1 -2.057
6 - 4 1 -4.39 4.94 -0.889 0.3739 1.4 -14.1 5.291
8 - 4 1 -7.85 8.46 -0.928 0.3535 1.5 -24.4 8.730
Term: cyl
Type: response
avg_comparisons(mod,
variables = "cyl",
by = "am",
hypothesis = ~ sequential | contrast)
Contrast contrast.1 Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 %
6 - 4 6 - 4 (1) - (0) 6.93 7.80 0.889 0.374 1.4 -8.36
8 - 4 8 - 4 (1) - (0) 1.74 9.29 0.188 0.851 0.2 -16.46
97.5 %
22.2
20.0
Type: response
avg_comparisons(mod,
variables = "cyl",
by = "am",
hypothesis = ~ sequential | am)
am Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0 (8 - 4) - (6 - 4) 1.74 4.79 0.363 0.717 0.5 -7.64 11.1
1 (8 - 4) - (6 - 4) -3.45 9.65 -0.358 0.720 0.5 -22.37 15.5
Type: response
We can also pass custom expressions to manipulate the vector of estimates by using I()
in the right hand side of the formula. For example, if we want the mean slope for cyl = c(6, 8)
we can express this as mean(x[2:3])
(with x
representing the vector of estimates):
avg_slopes(mod,
variables = "wt",
by = "cyl",
hypothesis = ~ I(mean(x[2:3])))
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
-5.07 4.53 -1.12 0.263 1.9 -13.9 3.8
Type: response
(Note that we ordered dat
so we know the which level of cyl
each estimate corresponds to):
These expressions can return multiple, named, estiamtes. Often is more conviniant to wrap these in a function that will then be called in the I()
expression. For example we can compare the estimated effect of wt
for cyl = 8
to the average estimated effect for cyl = c(4, 6)
. We want both the difference and the ratio.
less_vs_8 <- function(x) {
c("{8} - {4, 6}" = x[3] - mean(x[1:2]),
"{8} / {4, 6}" = x[3] / mean(x[1:2]))
}
avg_slopes(mod,
variables = "wt",
by = "cyl",
hypothesis = ~ I(less_vs_8(x)))
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
{8} - {4, 6} 3.957 4.805 0.823 0.410 1.3 -5.461 13.37
{8} / {4, 6} 0.375 0.357 1.052 0.293 1.8 -0.324 1.07
Type: response
Here we combine these expressions with a grouping component:
avg_slopes(mod,
variables = "wt",
by = c("cyl", "am"),
hypothesis = ~ I(less_vs_8(x)) | am)
am Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0 {8} - {4, 6} 2.597 6.43 0.404 0.686 0.5 -10.01 15.20
0 {8} / {4, 6} 0.484 0.64 0.757 0.449 1.2 -0.77 1.74
1 {8} - {4, 6} 2.243 10.90 0.206 0.837 0.3 -19.12 23.61
1 {8} / {4, 6} 0.471 2.15 0.219 0.826 0.3 -3.74 4.69
Type: response
It is also a powerful tool in categorical outcome models, to compare estimates for different outcome levels:
library("MASS")
mod <- polr(gear ~ cyl + hp, dat)
avg_comparisons(mod, variables = "cyl")
Re-fitting to get Hessian
Group Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3 6 - 4 0.2679 0.0787 3.403 < 0.001 10.6 0.1136 0.4221
3 8 - 4 0.8582 0.0432 19.886 < 0.001 289.9 0.7737 0.9428
4 6 - 4 0.0108 0.0474 0.228 0.81962 0.3 -0.0822 0.1038
4 8 - 4 -0.1971 0.0651 -3.028 0.00246 8.7 -0.3246 -0.0695
5 6 - 4 -0.2787 0.0717 -3.887 < 0.001 13.3 -0.4192 -0.1382
5 8 - 4 -0.6612 0.0694 -9.533 < 0.001 69.2 -0.7971 -0.5253
Term: cyl
Type: probs
Is the “8 vs. 4” contrast equal to the “6 vs. 4” contrast, within each outcome level?
avg_comparisons(mod,
variables = "cyl",
hypothesis = ~ sequential | group)
Re-fitting to get Hessian
Group group.1 Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 %
3 3 (8 - 4) - (6 - 4) 0.590 0.0815 7.24 < 0.001 41.0 0.431
4 4 (8 - 4) - (6 - 4) -0.208 0.0717 -2.90 0.00376 8.1 -0.348
5 5 (8 - 4) - (6 - 4) -0.383 0.0575 -6.65 < 0.001 35.0 -0.495
97.5 %
0.7502
-0.0673
-0.2698
Type: probs
Is the “8 vs. 4” contrast for outcome level 3 equal to the “8 vs. 4” contrast for outcome level 4?
avg_comparisons(mod,
variables = "cyl",
hypothesis = ~ sequential | contrast)
Re-fitting to get Hessian
Contrast contrast.1 Hypothesis Estimate Std. Error z Pr(>|z|) S
6 - 4 6 - 4 (4) - (3) -0.257 0.1084 -2.37 0.01775 5.8
6 - 4 6 - 4 (5) - (4) -0.289 0.0927 -3.12 0.00178 9.1
8 - 4 8 - 4 (4) - (3) -1.055 0.0859 -12.28 < 0.001 112.7
8 - 4 8 - 4 (5) - (4) -0.464 0.1274 -3.64 < 0.001 11.9
2.5 % 97.5 %
-0.470 -0.0445
-0.471 -0.1079
-1.224 -0.8869
-0.714 -0.2145
Type: probs
Here too we can a custom function within an I()
expression. For example, we can conditionally estimate the average rank by treating our estimate as the relative “weight” of each level:
mean_rank <- function(x) {
weighted.mean(c(3, 4, 5), w = x)
}
ranks <- avg_predictions(mod,
variables = "cyl",
hypothesis = ~ I(mean_rank(x)) | cyl)
Re-fitting to get Hessian
ranks
cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
4 4.64 0.0885 52.4 <0.001 Inf 4.46 4.81
6 4.09 0.1156 35.4 <0.001 908.0 3.86 4.32
8 3.12 0.0375 83.2 <0.001 Inf 3.04 3.19
Type: probs
These can then be further compared using the hypotheses()
function:
hypotheses(ranks, ratio ~ pairwise)
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
(b2) / (b1) 0.882 0.0296 29.8 <0.001 647.9 0.824 0.940
(b3) / (b1) 0.672 0.0150 44.7 <0.001 Inf 0.643 0.702
(b3) / (b2) 0.762 0.0234 32.6 <0.001 770.5 0.716 0.808
26 Functions
Hypothesis tests can also be conducted using arbitrary R
functions. This allows users to test hypotheses on complex aggregations or transformations. To achieve this, we define a custom function which accepts a fitted model or marginaleffects
objects, and returns a data frame with at least two columns: term
(or hypothesis
) and estimate
.
26.1 Predictions
When supplying a function to the hypothesis
argument, that function must accept an argument x
which is a data frame with columns rowid
and estimate
(and optional columns for other elements of newdata
). That function must return a data frame with columns term
(or hypothesis
) and estimate
.
In this example, we test if the mean predicted value is different from 2:
dat <- transform(mtcars, gear = factor(gear), cyl = factor(cyl))
mod <- lm(wt ~ mpg * hp * cyl, data = dat)
hyp <- function(x) {
data.frame(
hypothesis = "Avg(Ŷ) = 2",
estimate = mean(x$estimate) - 2
)
}
predictions(mod, hypothesis = hyp)
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Avg(Ŷ) = 2 1.22 0.0914 13.3 <0.001 132.0 1.04 1.4
Type: response
In this ordinal logit model, the predictions()
function returns one row per observation and per level of the outcome variable:
Re-fitting to get Hessian
Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3 0.471 0.0584 8.05 <0.001 50.2 0.3561 0.585
4 0.366 0.0715 5.12 <0.001 21.7 0.2263 0.507
5 0.163 0.0478 3.41 <0.001 10.6 0.0692 0.257
Type: probs
We can use a function in the hypothesis
argument to collapse the rows, displaying the average predicted values in groups 3-4 vs. 5:
fun <- function(x) {
out <- x |>
mutate(term = ifelse(group %in% 3:4, "3 & 4", "5")) |>
summarize(estimate = mean(estimate), .by = term)
return(out)
}
avg_predictions(mod, hypothesis = fun)
Re-fitting to get Hessian
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3 & 4 0.419 0.0239 17.51 <0.001 225.5 0.3717 0.465
5 0.163 0.0478 3.41 <0.001 10.6 0.0692 0.257
Type: probs
And we can compare the two categories by doing:
fun <- function(x) {
out <- x |>
mutate(term = ifelse(group %in% 3:4, "3 & 4", "5")) |>
summarize(estimate = mean(estimate), .by = term) |>
summarize(estimate = diff(estimate), term = "5 - (3 & 4)")
return(out)
}
avg_predictions(mod, hypothesis = fun)
Re-fitting to get Hessian
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
-0.256 0.0717 -3.56 <0.001 11.4 -0.396 -0.115
Term: 5 - (3 & 4)
Type: probs
26.2 Comparisons and slopes
In the same ordinal logit model, we can estimate the average effect of an increase of 1 unit in hp
on the expected probability of every level of the outcome:
avg_comparisons(mod, variables = "hp")
Re-fitting to get Hessian
Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3 -0.00774 0.002294 -3.38 <0.001 10.4 -0.01224 -0.00325
4 0.00258 0.002201 1.17 0.24 2.1 -0.00173 0.00690
5 0.00516 0.000882 5.85 <0.001 27.6 0.00343 0.00689
Term: hp
Type: probs
Comparison: +1
Compare estimate for different outcome levels:
fun <- function(x) {
x |>
mutate(estimate = (estimate - lag(estimate)),
group = sprintf("%s - %s", group, lag(group))) |>
filter(!is.na(estimate))
}
avg_comparisons(mod, variables = "hp", hypothesis = fun)
Re-fitting to get Hessian
Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
4 - 3 0.01033 0.00441 2.34 0.0191 5.7 0.00169 0.01897
5 - 4 0.00258 0.00245 1.05 0.2922 1.8 -0.00222 0.00737
Term: hp
Type: probs
Comparison: +1
Now suppose we want to compare the effect of hp
for different levels of the outcome, but this time we do the computation within levels of cyl
:
avg_comparisons(mod, variables = "hp", by = "cyl")
Re-fitting to get Hessian
Group cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3 4 -0.007862 0.003445 -2.282 0.02248 5.5 -1.46e-02 -0.00111
4 4 -0.003024 0.003714 -0.814 0.41560 1.3 -1.03e-02 0.00426
5 4 0.010886 0.002652 4.104 < 0.001 14.6 5.69e-03 0.01608
3 6 -0.013282 0.005038 -2.637 0.00838 6.9 -2.32e-02 -0.00341
4 6 0.008573 0.006431 1.333 0.18252 2.5 -4.03e-03 0.02118
5 6 0.004709 0.001929 2.442 0.01462 6.1 9.29e-04 0.00849
3 8 -0.004882 0.001457 -3.350 < 0.001 10.3 -7.74e-03 -0.00203
4 8 0.003995 0.001627 2.455 0.01409 6.1 8.06e-04 0.00719
5 8 0.000887 0.000431 2.059 0.03947 4.7 4.27e-05 0.00173
Term: hp
Type: probs
Comparison: +1
fun <- function(x) {
x |>
mutate(estimate = (estimate - lag(estimate)),
group = sprintf("%s - %s", group, lag(group)),
.by = "cyl") |>
filter(!is.na(estimate))
}
avg_comparisons(mod, variables = "hp", by = "cyl", hypothesis = fun)
Re-fitting to get Hessian
Group cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
4 - 3 4 0.00484 0.00666 0.727 0.46719 1.1 -0.008205 0.017882
5 - 4 4 0.01391 0.00546 2.548 0.01082 6.5 0.003211 0.024607
4 - 3 6 0.02185 0.01139 1.919 0.05503 4.2 -0.000471 0.044180
5 - 4 6 -0.00386 0.00805 -0.480 0.63117 0.7 -0.019638 0.011911
4 - 3 8 0.00888 0.00306 2.902 0.00371 8.1 0.002882 0.014874
5 - 4 8 -0.00311 0.00188 -1.651 0.09869 3.3 -0.006798 0.000581
Term: hp
Type: probs
Comparison: +1
26.3 Fitted models
The hypothesis
argument can be used to compute standard errors for arbitrary functions of model parameters. This user-supplied function must accept a single model object, and return a data.frame with two columns named term
and estimate
.
Here, we test if the sum of the hp
and mpg
coefficients is equal to 2:
mod <- glm(am ~ hp + mpg, data = mtcars, family = binomial)
fun <- function(x) {
b <- coef(x)
out <- data.frame(
term = "hp + mpg = 2",
estimate = b["hp"] + b["mpg"] - 2,
row.names = NULL
)
return(out)
}
hypotheses(mod, hypothesis = fun)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
-0.685 0.593 -1.16 0.248 2.0 -1.85 0.476
Term: hp + mpg = 2
Test of equality between two two predictions:
fun <- function(x) {
p <- predict(x, newdata = mtcars)
out <- data.frame(term = "pred[2] = pred[3]", estimate = p[2] - p[3])
return(out)
}
hypotheses(mod, hypothesis = fun)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
-1.33 0.616 -2.16 0.0305 5.0 -2.54 -0.125
Term: pred[2] = pred[3]
We can also use more complex aggregation patterns. In this ordinal logistic regression model, we model the number of gears for each ar. If we compute fitted values with the predictions()
function, we obtain one predicted probability for each individual car and for each level of the response variable:
library(MASS)
library(dplyr)
dat <- transform(mtcars,
gear = factor(gear),
cyl = factor(cyl))
mod <- polr(gear ~ cyl + hp, dat)
predictions(mod)
Re-fitting to get Hessian
Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3 0.3931 0.19125 2.06 0.03982 4.7 0.0183 0.768
3 0.3931 0.19125 2.06 0.03982 4.7 0.0183 0.768
3 0.0440 0.04256 1.03 0.30081 1.7 -0.0394 0.127
3 0.3931 0.19125 2.06 0.03982 4.7 0.0183 0.768
3 0.9963 0.00721 138.17 < 0.001 Inf 0.9822 1.010
--- 86 rows omitted. See ?print.marginaleffects ---
5 0.6969 0.18931 3.68 < 0.001 12.1 0.3258 1.068
5 0.0555 0.06851 0.81 0.41775 1.3 -0.0788 0.190
5 0.8115 0.20626 3.93 < 0.001 13.5 0.4073 1.216
5 0.9111 0.16818 5.42 < 0.001 24.0 0.5815 1.241
5 0.6322 0.19648 3.22 0.00129 9.6 0.2471 1.017
Type: probs
There are three levels to the outcome: 3, 4, and 5. Imagine that, for each car in the dataset, we want to collapse categories of the output variable into two categories (“3 & 4” and “5”) by taking sums of predicted probabilities. Then, we want to take the average of those predicted probabilities for each level of cyl
. To do so, we define a custom function, and pass it to the hypothesis
argument of the hypotheses()
function:
fun <- function(x) {
predictions(x, vcov = FALSE) |>
# label the new categories of outcome levels
mutate(group = ifelse(group %in% c("3", "4"), "3 & 4", "5")) |>
# sum of probabilities at the individual level
summarize(estimate = sum(estimate), .by = c("rowid", "cyl", "group")) |>
# average probabilities for each value of `cyl`
summarize(estimate = mean(estimate), .by = c("cyl", "group")) |>
# the `FUN` argument requires a `term` column
rename(term = cyl)
}
hypotheses(mod, hypothesis = fun)
Re-fitting to get Hessian
Term Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
6 3 & 4 0.8390 0.0651 12.89 <0.001 123.9 0.7115 0.967
4 3 & 4 0.7197 0.1099 6.55 <0.001 34.0 0.5044 0.935
8 3 & 4 0.9283 0.0174 53.45 <0.001 Inf 0.8943 0.962
6 5 0.1610 0.0651 2.47 0.0134 6.2 0.0334 0.289
4 5 0.2803 0.1099 2.55 0.0108 6.5 0.0649 0.496
8 5 0.0717 0.0174 4.13 <0.001 14.7 0.0377 0.106
Note that this workflow will not work for bayesian models or with bootstrap. However, with those models it is trivial to do the same kind of aggregation by calling get_draws()
and operating directly on draws from the posterior distribution. See the vignette on bayesian analysis for examples with the get_draws()
function.
27 Vectors and Matrices
The predictions()
function can estimate marginal means. The hypothesis
argument of that function offers a powerful mechanism to estimate custom contrasts between marginal means, by way of linear combination.
27.1 Simple contrast
Consider a simple example:
library(marginaleffects)
library(emmeans)
library(nnet)
dat <- mtcars
dat$carb <- factor(dat$carb)
dat$cyl <- factor(dat$cyl)
dat$am <- as.logical(dat$am)
dat <- sort_by(dat, ~carb)
mod <- lm(mpg ~ carb + cyl, dat)
mm <- predictions(mod,
by = "carb",
newdata = "balanced")
mm
carb Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
1 21.7 1.44 15.06 <0.001 167.8 18.8 24.5
2 21.3 1.23 17.29 <0.001 220.0 18.9 23.8
3 21.4 2.19 9.77 <0.001 72.5 17.1 25.7
4 18.9 1.21 15.59 <0.001 179.7 16.5 21.3
6 19.8 3.55 5.56 <0.001 25.2 12.8 26.7
8 20.1 3.51 5.73 <0.001 26.6 13.2 27.0
Type: response
The contrast between marginal means for carb==1
and carb==2
is:
21.66232 - 21.34058
[1] 0.32174
or
21.66232 + -(21.34058)
[1] 0.32174
or
or
The last two commands express the contrast of interest as a linear combination of marginal means.
In the predictions()
function, we can supply a hypothesis
argument to compute linear combinations of marginal means. This argument must be a numeric vector of the same length as the number of rows in the output. For example, in the previous there were six rows, and the two marginal means we want to compare are at in the first two positions:
lc <- c(1, -1, 0, 0, 0, 0)
predictions(mod,
by = "carb",
newdata = "balanced",
hypothesis = lc)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.322 1.77 0.181 0.856 0.2 -3.15 3.8
Term: custom
Type: response
27.2 Complex contrast
Of course, we can also estimate more complex contrasts:
lc <- c(0, -2, 1, 1, -1, 1)
predictions(mod,
by = "carb",
newdata = "balanced",
hypothesis = lc)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
-2.02 6.32 -0.32 0.749 0.4 -14.4 10.4
Term: custom
Type: response
emmeans
produces similar results:
27.3 Multiple contrasts
Users can also compute multiple linear combinations simultaneously by supplying a numeric matrix to hypotheses
. This matrix must have the same number of rows as the output of slopes()
, and each column represents a distinct set of weights for different linear combinations. The column names of the matrix become labels in the output. For example:
lc <- matrix(c(
-2, 1, 1, 0, -1, 1,
1, -1, 0, 0, 0, 0
), ncol = 2)
colnames(lc) <- c("Contrast A", "Contrast B")
lc
Contrast A Contrast B
[1,] -2 1
[2,] 1 -1
[3,] 1 0
[4,] 0 0
[5,] -1 0
[6,] 1 0
predictions(mod,
by = "carb",
newdata = "balanced",
hypothesis = lc)
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Contrast A -0.211 6.93 -0.0304 0.976 0.0 -13.79 13.4
Contrast B 0.322 1.77 0.1814 0.856 0.2 -3.15 3.8
Type: response
28 Arbitrary quantities
marginaleffects
can also compute uncertainty estimates for arbitrary quantities hosted in a data frame, as long as the user can supply a variance-covariance matrix. (Thanks to Kyle F Butts for this cool feature and example!)
Say you run a monte-carlo simulation and you want to perform hypothesis of various quantities returned from each simulation. The quantities are correlated within each draw:
# simulated means and medians
draw <- function(i) {
x <- rnorm(n = 10000, mean = 0, sd = 1)
out <- data.frame(median = median(x), mean = mean(x))
return(out)
}
sims <- do.call("rbind", lapply(1:25, draw))
# average mean and average median
coeftable <- data.frame(
term = c("median", "mean"),
estimate = c(mean(sims$median), mean(sims$mean))
)
# variance-covariance
vcov <- cov(sims)
# is the median equal to the mean?
hypotheses(
coeftable,
vcov = vcov,
hypothesis = "median = mean"
)
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
median=mean -0.000858 0.00915 -0.0938 0.925 0.1 -0.0188 0.0171
29 Joint hypotheses tests
The hypotheses()
function can also test multiple hypotheses jointly. For example, consider this model:
(Intercept) as.factor(cyl)6 as.factor(cyl)8 hp
35.98302564 -15.30917451 -17.90295193 -0.11277589
as.factor(cyl)6:hp as.factor(cyl)8:hp
0.10516262 0.09853177
We may want to test the null hypothesis that two of the coefficients are jointly (both) equal to zero.
hypotheses(model, joint = c("as.factor(cyl)6:hp", "as.factor(cyl)8:hp"))
Joint hypothesis test:
as.factor(cyl)6:hp = 0
as.factor(cyl)8:hp = 0
F Pr(>|F|) Df 1 Df 2
2.11 0.142 2 26
The joint
argument allows users to flexibly specify the parameters to be tested, using character vectors, integer indices, or Perl-compatible regular expressions. We can also specify the null hypothesis for each parameter individually using the hypothesis
argument.
Naturally, the hypotheses
function also works with marginaleffects
objects.
# ## joint hypotheses: regular expression
hypotheses(model, joint = "cyl")
Joint hypothesis test:
as.factor(cyl)6 = 0
as.factor(cyl)8 = 0
as.factor(cyl)6:hp = 0
as.factor(cyl)8:hp = 0
F Pr(>|F|) Df 1 Df 2
5.7 0.00197 4 26
# joint hypotheses: integer indices
hypotheses(model, joint = 2:3)
Joint hypothesis test:
as.factor(cyl)6 = 0
as.factor(cyl)8 = 0
F Pr(>|F|) Df 1 Df 2
6.12 0.00665 2 26
# joint hypotheses: different null hypotheses
hypotheses(model, joint = 2:3, hypothesis = 1)
Joint hypothesis test:
as.factor(cyl)6 = 1
as.factor(cyl)8 = 1
F Pr(>|F|) Df 1 Df 2
6.84 0.00411 2 26
hypotheses(model, joint = 2:3, hypothesis = 1:2)
Joint hypothesis test:
as.factor(cyl)6 = 1
as.factor(cyl)8 = 2
F Pr(>|F|) Df 1 Df 2
7.47 0.00273 2 26
# joint hypotheses: marginaleffects object
cmp <- avg_comparisons(model)
Warning: The `cyl` variable is treated as a categorical (factor) variable, but
the original data is of class numeric. It is safer and faster to convert such
variables to factor before fitting the model and calling a `marginaleffects`
function. This warning appears once per session.
hypotheses(cmp, joint = "cyl")
Joint hypothesis test:
cyl 6 - 4 = 0
cyl 8 - 4 = 0
F Pr(>|F|) Df 1 Df 2
1.6 0.221 2 26
We can also combine multiple calls to hypotheses
to execute a joint test on linear combinations of coefficients:
# fit model
mod <- lm(mpg ~ factor(carb), mtcars)
# hypothesis matrix for linear combinations
H <- matrix(0, nrow = length(coef(mod)), ncol = 2)
H[2:3, 1] <- H[4:6, 2] <- 1
# test individual linear combinations
hyp <- hypotheses(mod, hypothesis = H)
hyp
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
-12.0 4.92 -2.44 0.01477 6.1 -21.6 -2.35
-25.5 9.03 -2.83 0.00466 7.7 -43.2 -7.85
Term: custom
# test joint hypotheses
#hypotheses(hyp, joint = TRUE, hypothesis = c(-10, -20))
30 More
30.1 Difference-in-Differences
Now we illustrate how to use the machinery described above to do pairwise comparisons between contrasts, a type of analysis often associated with a “Difference-in-Differences” research design.
First, we simulate data with two treatment groups and pre/post periods:
library(data.table)
N <- 1000
did <- data.table(
id = 1:N,
pre = rnorm(N),
trt = sample(0:1, N, replace = TRUE))
did$post <- did$pre + did$trt * 0.3 + rnorm(N)
did <- melt(
did,
value.name = "y",
variable.name = "time",
id.vars = c("id", "trt"))
head(did)
id trt time y
<int> <int> <fctr> <num>
1: 1 1 pre -1.7451614
2: 2 0 pre -0.6572841
3: 3 0 pre 0.5989490
4: 4 0 pre -0.6343567
5: 5 0 pre -0.4826380
6: 6 0 pre 1.2042403
Then, we estimate a linear model with a multiple interaction between the time and the treatment indicators. We also compute contrasts at the mean for each treatment level:
did_model <- lm(y ~ time * trt, data = did)
comparisons(
did_model,
newdata = datagrid(trt = 0:1),
variables = "time")
trt Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0 -0.0638 0.0790 -0.807 0.42 1.3 -0.219 0.0911
1 0.3270 0.0806 4.055 <0.001 14.3 0.169 0.4850
Term: time
Type: response
Comparison: post - pre
Finally, we compute pairwise differences between contrasts. This is the Diff-in-Diff estimate:
comparisons(
did_model,
variables = "time",
newdata = datagrid(trt = 0:1),
hypothesis = ~pairwise)
Hypothesis Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
(b2) - (b1) 0.391 0.113 3.46 <0.001 10.9 0.169 0.612
Type: response