Inverse Probability Weighting (IPW) is a popular technique to remove confounding in statistical modeling. It essentially involves re-weighting your sample so that it represents the population you’re interested in. Typically, we begin by estimating the predicted probability that each unit is treated. Then, we use these probabilities as weights in model fitting and in the computation of marginal effects, contrasts, risk differences, ratios, etc.
treat age educ race married nodegree re74 re75 re78
NSW1 1 37 11 black 1 1 0 0 9930.0460
NSW2 1 22 9 hispan 0 1 0 0 3595.8940
NSW3 1 30 12 black 0 0 0 0 24909.4500
NSW4 1 27 11 black 0 1 0 0 7506.1460
NSW5 1 33 8 black 0 1 0 0 289.7899
NSW6 1 22 9 black 0 1 0 0 4056.4940
To begin, we use a logistic regression model to estimate the probability that each unit will treated:
m<-glm(treat~age+educ+race+re74, data =lalonde, family =binomial)
Then, we call predictions() to extract predicted probabilities. Note that we supply the original lalonde data explicity to the newdata argument. This ensures that all the original columns are carried over to the new dataset: dat. We also create a new column called wts that contains the inverse of the predicted probabilities:
Now, we use linear regression to model the outcome of interest: personal income in 1978 (re78). Note that we use the predictions as weights in the model fitting process.
mod<-lm(re78~treat*(age+educ+race+re74), data =dat, weights =wts)
Finally, we call avg_comparisons() to compute the average treatment effect. Note that we use the wts argument to specify the weights to be used in the computation.
By default, avg_comparisons() uses the Hajek estimator, that is, the weights are normalized to sum to 1 before computation. If a user wants to use the Horvitz-Thompson estimator—where normalization accounts for sample size—they can easily define a custom comparison function like this one: