treat age educ race married nodegree re74 re75 re78
NSW1 1 37 11 black 1 1 0 0 9930.0460
NSW2 1 22 9 hispan 0 1 0 0 3595.8940
NSW3 1 30 12 black 0 0 0 0 24909.4500
NSW4 1 27 11 black 0 1 0 0 7506.1460
NSW5 1 33 8 black 0 1 0 0 289.7899
NSW6 1 22 9 black 0 1 0 0 4056.4940
Matching
author: “Vincent Arel-Bundock”
This chapter introduces how to use marginaleffects
to estimate treatment effects after pre-processing a dataset to achieve better covariate balance. The presentation is very short. Readers who seek a more comprehensive understanding and application of these methods should refer to Noah Greifer’s excellent and detailed work on the topic and to the MatchIt
package vignettes and website
The procedure we highlight can be broken down into three steps:
- Use
MatchIt
to pre-process the data and achieve better covariate balance - Fit a regression model to the outcome of interest
- Use
marginaleffects
and G-Computation to estimate a quantity of interest, such as the Average treatment effect on the treated (ATT)
To begin, we load libraries and the data from the classic Lalonde experiment:
We are interested in the treatment effect of the treat
variable on the re78
outcome. The treat
variable is a binary variable indicating whether the individual received job training. The re78
variable is the individual’s earnings in 1978.
Matching
The first step is to pre-process the dataset to achieve better covariate balance. To do this, we use the MatchIt::matchit()
function and a 1-to-1 nearest neighbor matching with replacement on the Mahaloanobis distance. This function supports many other matching methods, see ?matchit
.
dat <- matchit(
treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde, distance = "mahalanobis",
replace = FALSE)
dat <- match.data(dat)
Fitting
Now, we estimate a linear regression model with interactions between the treatment and covariates. Note that we use the weights
argument to use the weights supplied by our matching method:
fit <- lm(
re78 ~ treat * (age + educ + race + married + nodegree),
data = dat,
weights = weights)
Quantity of interest
Finally, we use the avg_comparisons()
function of the marginaleffects
package to estimate the ATT and its standard error. In effect, this function applies G-Computation to estimate the quantity of interest. We use the following arguments:
-
variables="treat"
indicates that we are interested in the effect of thetreat
variable. -
newdata=subset(dat, treat == 1)
indicates that we want to estimate the effect for the treated individuals only (i.e., the ATT). -
wts="weights"
indicates that we want to use the weights supplied by the matching method.
avg_comparisons(
fit,
variables = "treat",
newdata = subset(dat, treat == 1),
vcov = ~subclass,
wts = "weights")
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
1221 850 1.44 0.151 2.7 -445 2888
Term: treat
Type: response
Comparison: mean(1) - mean(0)
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
Learn more
The MatchIt
vignette titled “Estimating Effects After Matching” describes many more options, including different measures of uncertainty (bootstrap, clustering, etc.), different estimands (ATE, etc.), and different strategies for adjustment.