library(marginaleffects)
dat <- get_dataset("thornton")
mod <- lm(outcome ~ incentive, data = dat)
coef(mod)
(Intercept) incentive
0.3397746 0.4510603
Experimental analysis is a common use case for the marginaleffects
package, which provides researchers with a powerful toolkit for estimating treatment effects, interpreting interactions, and visualizing results across a variety of experimental designs.
This chapter illustrates the application of marginaleffects
through two widely used contexts: estimating the average treatment effect of a single randomized treatment, and a 2-by-2 factorial experiment. The marginaleffects.com website hosts more tutorials to guide the analysis and interpretation of a variety of research designs, including forced-choice conjoint experiments.
In this section, we explain regression adjustment, a method that helps improve the precision of treatment effect estimates in experiments. We will use the same HIV experiment data from Thornton (2008) as in previous chapters. In this experiment, the outcome
is a binary variable that shows whether individuals travelled to the clinic to learn their HIV status. The treatment is a financial incentive
, randomly given to participants, with the goal of encouraging testing.
Since the treatment was randomized, we can estimate the average treatment effect (ATE) using a linear regression on the binary outcome. This is the so-called “linear probability model.”
library(marginaleffects)
dat <- get_dataset("thornton")
mod <- lm(outcome ~ incentive, data = dat)
coef(mod)
(Intercept) incentive
0.3397746 0.4510603
The coefficient on incentive
tells us the difference in the probability of seeing one’s test result between those who received the incentive and those who did not. This is the same result you would get from the G-computation estimate using the avg_comparisons()
function. The benefit of this function is that it allows us to easily report heteroskedasticity-consistent (or other robust) standard errors:
avg_comparisons(mod,
variables = "incentive",
vcov = "HC2")
Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|
0.451 | 0.0209 | 21.6 | <0.001 | 0.41 | 0.492 |
When the treatment is randomized, it is not strictly necessary to include control variables in the model to eliminate bias. However, including covariates can sometimes improve the precision of our estimates by reducing the unexplained variance in the outcome (TODO: cite).
As Freedman (2008) points out, however, naively adding covariates to the regression model introduce small-sample bias and degrade asymptotic precision. To avoid this problem, Lin (2013) recommends a simple solution: interacting all covariates with the treatment indicator, and reporting heteroskedasticity-robust standard errors.
mod <- lm(outcome ~ incentive * (age + distance + hiv2004), data = dat)
Intepreting the results of this model is not as easy as in the previous example, since there are multiple coefficients, with several multiplicative interactions.
coef(mod)
(Intercept) incentive age distance
0.309537645 0.496482649 0.003495124 -0.042323583
hiv2004 incentive:age incentive:distance incentive:hiv2004
0.013562711 -0.002177778 0.014571883 -0.070766068
Thankfully, we can simply call avg_comparisons()
again to get the regression adjusted estimate, along with robust standard errors.
avg_comparisons(mod,
variables = "incentive",
vcov = "HC2")
Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|
0.449 | 0.0209 | 21.5 | <0.001 | 0.408 | 0.49 |
This result suggests that receiving an incentive
increases by 45 percentage points the probability of seeking one’s test result.
A factorial experiment is a type of study design that allows researchers to assess the effects of two or more randomized treatments simultaneously, with each treatment having multiple levels or conditions. A common example is the 2-by-2 design, where two variables with two levels are randomized simultaneously and independently. This setup enables the evaluator to quantify the effects of each treatment, but also to examine if the treatments interact with one another.
In medicine, factorial experiments are widely used to explore complex phenomena like drug interactions, where researchers test different combinations of treatments to see how they affect patient outcomes. In plant phyisiology, they could be used to see how combinations of temperature and humidity affect photosynthesis. In a business context, a company could use a factorial design to test if different advertising strategies and price points drive sales.
To illustrate, let’s consider a simple data set with 32 observations, a numeric outcome \(Y\), and two treatments with two levels: \(T_a=\{0,1\}\) and \(T_b=\{0,1\}\). We fit a linear regression model with each treatment entered individually, along with their multiplicative interaction (see Chapter 10 for more on interactions).
library(marginaleffects)
dat <- arrow::read_parquet("https://marginaleffects.com/data/factorial.parquet")
mod <- lm(Y ~ Ta + Tb + Ta:Tb, data = dat)
coef(mod)
(Intercept) Ta Tb Ta:Tb
15.050000 5.692857 4.700000 2.928571
We can plot the model’s predictions for each combination of \(T_a\) and \(T_b\) using the plot_predictions
function.
plot_predictions(mod, by = c("Ta", "Tb"))
Now, imagine we want to estimate the effect of changing \(T_a\) from 0 to 1 on \(Y\), while holding \(T_b=0\). We define the two counterfactual values of \(Y\) and take their difference:
\[\begin{align*} \hat{Y} &= \hat{\beta}_1 + \hat{\beta}_2 \cdot T_a + \hat{\beta}_3 \cdot T_b + \hat{\beta}_4 \cdot T_a \cdot T_b\\ \hat{Y}_{T_a=0,T_b=0} &= \hat{\beta}_1 + \hat{\beta}_2 \cdot 0 + \hat{\beta}_3 \cdot 0 + \hat{\beta}_4 \cdot 0 \cdot 0\\ \hat{Y}_{T_a=1,T_b=0} &= \hat{\beta}_1 + \hat{\beta}_2 \cdot 1 + \hat{\beta}_3 \cdot 0 + \hat{\beta}_4 \cdot 1 \cdot 0\\ \hat{Y}_{T_a=1,T_b=0} - \hat{Y}_{T_a=0,T_b=0} &= \hat{\beta}_2 = 5.693 \end{align*}\]
We can use a similar approach to estimate a cross-contrast, that is, the effect of changing both \(T_a\) and \(T_b\) from 0 to 1 simultaneously.
\[\begin{align*} \hat{Y}_{T_a=0,T_b=0} &= \hat{\beta}_1 + \hat{\beta}_2 \cdot 0 + \hat{\beta}_3 \cdot 0 + \hat{\beta}_4 \cdot 0 \cdot 0\\ \hat{Y}_{T_a=1,T_b=1} &= \hat{\beta}_1 + \hat{\beta}_2 \cdot 1 + \hat{\beta}_3 \cdot 1 + \hat{\beta}_4 \cdot 1 \cdot 1\\ \hat{Y}_{T_a=1,T_b=1} - \hat{Y}_{T_0=1,T_b=0} &= \hat{\beta}_2 + \hat{\beta}_3 + \hat{\beta}_4 = 13.321 \end{align*}\]
In a simple 2-by-2 experiment, this arithmetic is fairly simple. However, things can quickly get complicated when more treatment conditions are introduced, or when regression models include non-linear components. Moreover, even if we can add coefficient estimates to obtain the quantity of interest, computing standard errors (classical or robust) is not straightforward. Thus, it is often convenient to rely on software like marginaleffects
to do the hard work for us.
To estimate the effect of \(T_a\) on \(Y\), holding \(T_b=0\) constant, we use the avg_comparisons()
function. The variables
argument specifies the treatment of interest, and the newdata
argument specifies the levels of the other treatment(s) at which we want to evaluate the contrast. In this case, we use newdata=subset(Tb == 0)
to compute the quantity of interest only in the subset of observations where \(T_b=0\).
avg_comparisons(mod,
variables = "Ta",
newdata = subset(Tb == 0))
Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|
5.69 | 1.65 | 3.45 | <0.001 | 2.46 | 8.93 |
To estimate a cross-contrast, we specify each treatment of interest in the variables
argument, and set cross=TRUE
. This gives us an estimate of the effect of simultaneously moving \(T_a\) and \(T_b\) from 0 to 1.
avg_comparisons(mod,
variables = c("Ta", "Tb"),
cross = TRUE)
C: Ta | C: Tb | Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|---|---|
1 - 0 | 1 - 0 | 13.3 | 1.65 | 8.07 | <0.001 | 10.1 | 16.6 |
Figure 9.1 shows six contrasts of interest in a 2-by-2 experimental design. The vertical axis shows the value of the outcome variable in each treatment group The horizontal axis shows the value of the first treatment \(T_a\). The shape of points—0 or 1—represeents the value of the \(T_b\) treatment. For example, the “1” symbol in the top left shows the average value of \(Y\) in the \(T_a=1,T_b=1\) group.
In each panel, the arrow links two treatment groups that we want to compare. For example, in the top-left panel we compare \(\hat{Y}_{T_a=0,T_b=0}\) to \(\hat{Y}_{T_a=1,T_b=0}\). In the bottom-right panel, we compare \(\hat{Y}_{T_a=0,T_b=1}\) to \(\hat{Y}_{T_a=1,T_b=0}\). The avg_comparisons()
call at the top of each panel shows how to use marginaleffects
to compute the difference in \(Y\) that results from the illustrated change in treatment conditions.
One reason factorial experiments are so popular is that they allow researchers to assess the possibility that treatments interact with one another. Indeed, in somes cases, the effect of a treatment \(T_a\) will be stronger (or weaker) at different values of treatment \(T_b\). To test this possibility, we use the strategies described in Chapter 10 on interactions.
First, we estimate the effect of a change from 0 to 1 in \(T_a\), in the subsets of observations with different values of \(T_b\).
avg_comparisons(mod, variables = "Ta", by = "Tb")
Tb | Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|---|
0 | 5.69 | 1.65 | 3.45 | <0.001 | 2.46 | 8.93 |
1 | 8.62 | 1.93 | 4.46 | <0.001 | 4.84 | 12.41 |
The estimated effect of \(T_a\) on \(Y\) is equal to 5.69 when \(T_b=0\), but equal to 8.62 when \(T_b=1\). Is the difference between those estimates statistically significant? Does \(T_b\) moderate the effect the effect of \(T_a\)? Is there treatment heterogeneity or interaction between \(T_a\) and \(T_b\)? To answer these questions, we use the hypothesis
argument and check if the difference between subgroup estimates is significant:
avg_comparisons(mod, variables = "Ta", by = "Tb", hypothesis = "b2 - b1 = 0")
Hypothesis | Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|---|
b2-b1=0 | 2.93 | 2.54 | 1.15 | 0.249 | -2.05 | 7.91 |
While there is a difference between our two estimates, the \(z\) statistic and \(p\) value indicate that this difference is not statistically significant. Therefore, we cannot reject the null hypothesis that the effect of \(T_a\) is the same, regardless of the value of \(T_b\).