10  Categorical and ordinal outcomes

This chapter shows how the framework and tools introduced in Chapter 4 can help us give meaningful interpretation to the estimates obtained by fitting a categorical or ordinal outcome model.

We say that the dependent variable of a regression model is categorical when it is discrete, involves a finite number of categories, but does not have a natural ordering. Categorical outcomes are very common in data analysis, for example, when one wishes to model the choice of a mode of transportation (e.g., car, bus, bike, walk). A popular way to approach such data is to use specify and fit a multinomial logit model. In such a model, the probability of each category \(j\) of the outcome variable is modeled as:

We say that the dependent variable of a regression model is ordinal when it is discrete, involves a finite number of categories, and has a natural ordering. Ordinal outcomes are very common in data analysis, for example, when one wishes to model levels of satisfaction (e.g., very dissatisfied, dissatisfied, neutral, satisfied, very satisfied). A popular way to approach such data is to specify and fit an ordered probit model. In such a model, the probability of each category \(j\) of the outcome variable is modeled as (Cameron and Trivedi 2005):

\[P(Y = j) = \Phi(\theta_j - \mathbf{X}\beta) - \Phi(\theta_{j-1} - \mathbf{X}\beta)\]

where \(\mathbf{X}\) represents the vector of predictors, \(\beta\) the vector of coefficients, \(\theta_j\) and \(\theta_{j-1}\) the threshold parameters defining the boundaries between categories \(j\) and \(j-1\), and \(\Phi\) the standard normal cumulative distribution function.

The distinguishing feature of models like the multinomial logit or ordered probit is that they estimate different parameters for each level the outcome variable. This allows the analyst to derive different estimates of the quantities of interest (predictions, counterfactual comparisons, and slopes) for each level of the dependent variable. Since many categorical and ordered outcome models share this property, the framework introduced in this book applies in roughly the same way. Therefore, this chapter only considers the ordered probit case, but the same workflow applies to a wide class of models, including other multinomial and ordered approaches.

Let’s consider Extramarital Affairs dataset, collected through a survey by Psychology Today in 1969, and analyzed in Fair (1978). These data include 601 observations on various demographic, marital, and personal characteristics: gender, age, years married, presence of children, religiousness, education, occupation, self-rated happiness in marriage, and the frequency of extramarital sexual encounters in the past year. The ‘affairs’ variable records the self-reported frequency of extramarital sexual encounters in the past year: : 0, 1, 2, 3, 4-10 or <10.

To start, we load the data and plot the distribution of the outcome:

library(MASS)
library(marginaleffects)
dat <- read.csv("data/affairs.csv")

# The outcome variable needs to be ordered 
dat$affairs <- factor(dat$affairs, unique(dat$affairs))

barplot(table(dat$affairs), ylab = "N", xlab = "Affairs")
Figure 10.1: Distribution of the self-reported number of extra-marital affairs during 12 months preceding the survey (Fair, 1978).

Now, we fit an ordered probit regression model and summarize the results:

# Fit and summarize the model
mod <- polr(
  affairs ~ children + yearsmarried + gender,
  method = "probit", data = dat, Hess = TRUE)
summary(mod)

Coefficients:
               Value Std. Error t value
childrenyes  0.17290     0.1506  1.1477
yearsmarried 0.03457     0.0116  2.9793
gendermale   0.08852     0.1075  0.8234

Intercepts:
         Value   Std. Error t value
0|1       1.1438  0.1313     8.7115
1|2       1.3409  0.1336    10.0358
2|3       1.4532  0.1352    10.7500
3|4-10    1.5942  0.1373    11.6074
4-10|>10  2.0248  0.1472    13.7524

The coefficients in this model can be viewed as measuring a change in a latent variable, but building intuition our model is not straightforward simply by looking at coefficient estimates. Instead, of coefficient estimates, we focus on predictions and counterfactual comparisons, as we have done throughout this book.

10.1 Predictions

The first step in our post-hoc workflow is to compute average predictions, that is, the expected probabilities of each category of the dependent variable, averaged over the observed data. This provides a summary measure of the model’s predicted outcomes, offering a more interpretable understanding of the model’s overall behavior.

p <- avg_predictions(mod)
p
Group Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
0 0.7497 0.01739 43.12 0.7157 0.7838
1 0.0567 0.00944 6.01 0.0382 0.0752
2 0.0285 0.00680 4.19 0.0152 0.0418
3 0.0317 0.00715 4.44 0.0177 0.0457
4-10 0.0702 0.01039 6.76 0.0498 0.0906
>10 0.0631 0.00986 6.40 0.0438 0.0824

Since we are working with a(n ordered) categorical outcome model, the avg_predictions() function automatically returns one average predicted probability per outcome level. The group identifiers are available in the group column of the output data frame:

[1] "group"     "estimate"  "std.error" "statistic" "p.value"   "s.value"  
[7] "conf.low"  "conf.high"
p$group
[1] 0    1    2    3    4-10 >10 
Levels: 0 1 2 3 4-10 >10

Unsurprisingly, given Figure 10.1, the expected probability that affairs=0 is much higher than that of any of the other outcome levels.

As we saw in Chapter 6, it is easy to compute average predictions for different subgroups. For instance, if we want to know the expected probability of each outcome level, for survey respondents with and without children, we simply call:

avg_predictions(mod, by = "children")
Group children Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
0 no 0.8388 0.02748 30.53 0.78496 0.8927
0 yes 0.7143 0.02135 33.45 0.67245 0.7562
1 no 0.0431 0.00856 5.03 0.02632 0.0599
1 yes 0.0622 0.01039 5.98 0.04180 0.0825
2 no 0.0205 0.00554 3.71 0.00967 0.0314
2 yes 0.0316 0.00758 4.18 0.01679 0.0465
3 no 0.0219 0.00581 3.77 0.01052 0.0333
3 yes 0.0356 0.00807 4.42 0.01982 0.0514
4-10 no 0.0442 0.00992 4.46 0.02479 0.0637
4-10 yes 0.0806 0.01212 6.65 0.05680 0.1043
>10 no 0.0314 0.00907 3.46 0.01362 0.0492
>10 yes 0.0757 0.01210 6.26 0.05199 0.0994

And we can plot these figures with an analogous call. We use the by argument to marginalize across combinations of the group outcome and children predictor:

plot_predictions(mod, by = c("group", "children"))
Figure 10.2: Average predicted probabilities for each outcome level, for the subpopulations with and without children.

Figure 10.2 shows again that the most likely answer—by far—is zero. Moreover, the graph suggests that there the differences in predicted probabilities between the “yes” and “no” children groups is quite small. This impression will be confirmed in the next section, where we discuss counterfactual comparisons.

In some cases, it may be useful to aggregate or combine different outcome level categories. For example, we may want to compute the sum of all predicted probabilities for group categories above 0.

One powerful strategy is to define a custom function to use in the hypothesis argument of avg_predictions(). According to the function manual (?avg_predictions), a custom hypothesis function must accept the same data frame that we obtain when calling a function without the hypothesis argument, i.e. a data frame with group, term, and estimate columns. The custom function can then process this input and return a data frame with a term and an estimate column.

For example, this custom function accepts a data frame x, collapses the predicted probabilities by group categories, and returns a data frame of new estimates:

hyp <- function(x) {
  x$term <- ifelse(x$group == "0", "0", ">1")
  aggregate(estimate ~ term + children, FUN = sum, data = x)
}
avg_predictions(mod, by = "children", hypothesis = hyp)
Term children Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
>1 no 0.161 0.0275 5.87 0.107 0.215
0 no 0.839 0.0275 30.53 0.785 0.893
>1 yes 0.286 0.0214 13.38 0.244 0.328
0 yes 0.714 0.0214 33.45 0.672 0.756

The custom function above applied a sum by subgroup using the aggregate() function, which has been available in base R since the 20th century. An equivalent tidyverse version of this custom function would be:

library(tidyverse)
hyp <- function(x) {
  x %>% 
    mutate(term = if_else(group == "0", "0", ">1")) %>%
    summarize(estimate = sum(estimate), .by = c("term", "children"))
}
avg_predictions(mod, by = "children", hypothesis = hyp)

10.2 Counterfactual comparisons

After looking at predictions, we can no move on to counterfactual comparisons. Our first object here is to answer this question: What is the effect of increasing the yearsmarried function by 5 years on the average predicted probability of each level of the outcome variable?

avg_comparisons(mod, variables = list(yearsmarried = 5))
Group Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
0 -0.05617 0.01940 -2.89 0.00379 -0.094199 -0.01814
1 0.00687 0.00236 2.91 0.00363 0.002241 0.01150
2 0.00427 0.00168 2.54 0.01094 0.000981 0.00756
3 0.00551 0.00215 2.56 0.01040 0.001296 0.00973
4-10 0.01590 0.00580 2.74 0.00607 0.004543 0.02726
>10 0.02362 0.00923 2.56 0.01053 0.005521 0.04171

cmp <- avg_comparisons(mod, variables = list(yearsmarried = 5))
cmp <- sprintf("%.1f", abs(cmp$estimate) * 100)

On average, increasing the yearsmarried variable by 5 reduces the predicted probability that affairs=0 by 5.6 percentage points. On average, increasing yearsmarried by 5 increases the predicted probability that affairs>10 by 2.4 percentage points. These counterfactual comparisons are associated with small \(p\) values, so we can reject the null hypothesis that yearsmarried has no effect on predicted affairs.

In contrast, if we estimate the effect of manipulating the gender variable, we find that differences in predicted probabilities are small and statistically insignificant. We cannot reject the null hypothesis that gender has no effect on affairs.

avg_comparisons(mod, variables = "gender")
Group Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
0 -0.02736 0.03324 -0.823 0.411 -0.09251 0.0378
1 0.00373 0.00458 0.815 0.415 -0.00524 0.0127
2 0.00225 0.00278 0.808 0.419 -0.00320 0.0077
3 0.00284 0.00350 0.811 0.417 -0.00402 0.0097
4-10 0.00786 0.00959 0.819 0.413 -0.01095 0.0267
>10 0.01068 0.01305 0.818 0.413 -0.01491 0.0363