Mize and Han (2025) recommend several strategies to summarize the effects of different levels of a categorical independent variable, or the effect of an independent variable across different levels of a categorical dependent variable. This notebook implements some of their strategies in marginaleffects. The quantities are simply defined and computed. Please refer to the original article in Sociological Science for motivation and intepretation.
38.1 Data and model
To begin, we load the required libraries and data from the General Social Survey. We are interested in the effect of the variables race4 and woman on the variable conserv. Since conserv is a binary variable, we fit a logit model.
Mize and Han (2025) define the “average marginal effect inequality” as the average of the absolute values of the marginal effects. Here, the expression “marginal effect,” for a categorical predictor, refers to what Chapter 6 referred to as “counterfactual comparisons.” First, we compute the marginal effects for every level of the race4 predictor.
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.0838 0.013 6.45 <0.001 33.1 0.0583 0.109
Type: response
38.2.2 Weighted Average Marginal Effect Inequality
Mize and Han (2025) argue that the quantity computed in the previous section may be inappropriate when there are strong imbalances between the prevalence of different classes in the sample. Their proposed remedy is to weight each of the contrasts by their relative proportions in the sample. First, we create a vector of weights based on sample proportions.
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.0838 0.013 6.45 <0.001 33.1 0.0583 0.109
Type: response
38.2.3 Comparing within a model
Mize and Han (2025) also propose a strategy to compare the marginal effects of different categorical predictors. The idea is simple. First, we compute the average marginal effect inequality for each preditor. We do this by computing contrasts for the two predictors of interest (race4 and woman) using the variables argument. Then, we compute the quantity of interest using the hypothesis argument, taking care to include |term on the right-hand side of the formula, to compute the quantity of interest separately for the two terms.
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
race4 0.0838 0.0130 6.45 <0.001 33.1 0.0583 0.109
woman 0.0699 0.0156 4.48 <0.001 17.0 0.0393 0.100
Type: response
We find that the estimate is larger for race4 than for woman. Is this difference statistically significant? Are the two estimates distinguishable from one another? To check this, we pipe the previous command to the hypotheses() function.
No. While the two estimates seem different at first glance, we cannot reject the null hypothesis that they are equal.
38.2.4 Comparing across models
Imagine that we wish to compare the marginal effect of the woman predictor in a model that controls for race to the marginaleffect effect of woman in a model that does not control for race. We can do this by fitting two models and then comparing the marginal effects. First, we define an estimator function that fits the models and returns a marginaleffects object with the difference of interest.1
estimator<-function(data){# fit the two models we wish to comparemod1<-glm(conserv~woman+class+race4, data =data, family =binomial(link ="logit"))mod2<-glm(conserv~woman+class, data =data, family =binomial(link ="logit"))# compute the marginal effects in both modelsmfx1<-avg_comparisons(mod1, variables ="woman", vcov =FALSE)mfx2<-avg_comparisons(mod2, variables ="woman", vcov =FALSE)# compare the two models and store the difference in the estimate column# estimator() must return a marginaleffects objectmfx1$estimate<-mfx1$estimate-mfx2$estimatereturn(mfx1)}estimator(gss)
Estimate
0.00636
Term: woman
Type: response
Comparison: Women - Men
Finally, we use the inferences() function to bootstrap the while procedure and obtain confidence intervals.
Estimate 2.5 % 97.5 %
0.00636 0.002 0.0108
Term: woman
Type: response
Comparison: Women - Men
The number reported in the Estimate column represents the difference between the marginal effect of woman in the first model and the marginal effect of woman in the second model. The confidence interval represents the uncertainty around this difference.
38.3 Categorical Dependent Variable
To show how the concepts developed in the previous sections can be applied to a categorical dependent variable, we fit a multinomial logit model to the health variable. Our focal predictor is a binary variable called college.
excellent good fair poor
No Col Degree 287 1068 498 108
College Degree 457 967 191 24
mod<-multinom(health~college+race4+class+woman, data =gss, trace =FALSE)avg_comparisons(mod, variables =c("college", "woman"))
Term Group Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
college excellent College Degree - No Col Degree 0.08666 0.01504 5.761 <0.001 26.8 0.05718 0.11614
college good College Degree - No Col Degree 0.02378 0.01833 1.298 0.1945 2.4 -0.01214 0.05971
college fair College Degree - No Col Degree -0.08359 0.01425 -5.865 <0.001 27.7 -0.11152 -0.05565
college poor College Degree - No Col Degree -0.02686 0.00664 -4.043 <0.001 14.2 -0.03987 -0.01384
woman excellent Women - Men 0.00129 0.01330 0.097 0.9227 0.1 -0.02478 0.02736
woman good Women - Men 0.01181 0.01656 0.713 0.4758 1.1 -0.02065 0.04427
woman fair Women - Men -0.02372 0.01301 -1.824 0.0682 3.9 -0.04922 0.00177
woman poor Women - Men 0.01063 0.00610 1.743 0.0813 3.6 -0.00132 0.02257
Type: probs
The results in the Estimate column show the estimate marginal effect of getting a college degree on the probability of belonging to each of the health categories. Clearly, having a college degree seems to increase the likelihood that individuals will report being in good or excellent health.
We can compute the average marginal effect inequality for the college predictor and woman predictors as before, using the hypothesis argument.
It seems that college has a larger effect on health—on average across outcome levels—than woman.
Mize, Trenton D., and Bing Han. 2025. “Inequality and Total Effect Summary Measures for Nominal and Ordinal Variables.”Sociological Science 12 (7): 115–57. https://doi.org/10.15195/v12.a7.
We set vcov=FALSE for efficiency because we will bootstrap the full procedure.↩︎