10 Categorical outcomes
Several packages in the R
ecosystem allow users to estimate models for ordered or discrete choice, such as ordered probit or multinomial logit. This case study illustrates the use of marginaleffects
with the MASS
, nnet
, and mlogit
packages.
We begin by loading two libraries:
10.1 MASS::polr
function
Consider a simple ordered logit model in which we predict the number of gears of a car based its miles per gallon and horsepower:
Now, consider a car with 25 miles per gallon and 110 horsepower. The expected predicted probability for each outcome level (gear) for this car is:
predictions(mod, newdata = datagrid(mpg = 25, hp = 110))
#>
#> Group mpg hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 3 25 110 0.203 0.0959 2.12 0.0339 4.9 0.0155 0.391
#> 4 25 110 0.578 0.1229 4.70 <0.001 18.6 0.3373 0.819
#> 5 25 110 0.218 0.1007 2.17 0.0302 5.1 0.0209 0.416
#>
#> Columns: rowid, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, gear, mpg, hp
#> Type: probs
Since the gear
is categorical, we make one prediction for each level of the outcome.
Now consider the marginal effects (aka slopes or partial derivatives) for the same car:
slopes(mod, variables = "mpg", newdata = datagrid(mpg = 25, hp = 110))
#>
#> Group Term mpg hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 3 mpg 25 110 -0.06041 0.0169 -3.5813 <0.001 11.5 -0.09347 -0.0273
#> 4 mpg 25 110 -0.00321 0.0335 -0.0958 0.9237 0.1 -0.06896 0.0625
#> 5 mpg 25 110 0.06362 0.0301 2.1132 0.0346 4.9 0.00461 0.1226
#>
#> Columns: rowid, term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, predicted_lo, predicted_hi, predicted, gear
#> Type: probs
Again, marginaleffects
produces one estimate of the slope for each outcome level. For a small step size \(\varepsilon\), the printed quantities are estimated as:
\[\frac{P(gear=3|mpg=25+\varepsilon, hp=110)-P(gear=3|mpg=25-\varepsilon, hp=110)}{2 \cdot \varepsilon}\] \[\frac{P(gear=4|mpg=25+\varepsilon, hp=110)-P(gear=4|mpg=25-\varepsilon, hp=110)}{2 \cdot \varepsilon}\] \[\frac{P(gear=5|mpg=25+\varepsilon, hp=110)-P(gear=5|mpg=25-\varepsilon, hp=110)}{2 \cdot \varepsilon}\]
When we call avg_slopes()
, marginaleffects
will repeat the same computation for every row of the original dataset, and then report the average slope for each level of the outcome:
avg_slopes(mod)
#>
#> Group Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 3 hp -0.00377 0.001514 -2.49 0.01284 6.3 -0.006735 -0.00080
#> 3 mpg -0.07014 0.015485 -4.53 < 0.001 17.4 -0.100490 -0.03979
#> 4 hp 0.00201 0.000957 2.10 0.03553 4.8 0.000136 0.00389
#> 4 mpg 0.03747 0.013861 2.70 0.00687 7.2 0.010303 0.06464
#> 5 hp 0.00175 0.000833 2.11 0.03519 4.8 0.000122 0.00339
#> 5 mpg 0.03267 0.009572 3.41 < 0.001 10.6 0.013907 0.05143
#>
#> Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: probs
10.2 nnet
package
The multinom
function of the nnet
package allows users to fit log-linear models via neural networks. The data
used for this function is a data frame with one observation per row, and the response variable is coded a factor. All the marginaleffects
package function work seamlessly with this model. For example, we can estimate a model and compute average marginal effects as follows:
library(nnet)
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mod <- multinom(factor(gear) ~ hp + mpg, data = mtcars, trace = FALSE)
avg_slopes(mod, type = "probs")
#>
#> Group Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 3 hp -3.44e-05 0.00225 -0.0153 0.98780 0.0 -0.00444 0.004372
#> 3 mpg -7.13e-02 0.02645 -2.6959 0.00702 7.2 -0.12316 -0.019467
#> 4 hp -4.67e-03 0.00221 -2.1123 0.03466 4.9 -0.00900 -0.000337
#> 4 mpg 1.59e-02 0.02010 0.7915 0.42865 1.2 -0.02348 0.055298
#> 5 hp 4.70e-03 0.00130 3.6172 < 0.001 11.7 0.00215 0.007249
#> 5 mpg 5.54e-02 0.01642 3.3739 < 0.001 10.4 0.02322 0.087590
#>
#> Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: probs
Notice that in such models, we get one marginal effect for each term, for each level of the response variable. For this reason, we should use "group"
in the condition
argument (or facet_*()
function) when calling one of the plotting functions:
library(ggplot2)
plot_predictions(mod, condition = c("mpg", "group"), type = "probs")
plot_predictions(mod, condition = "mpg", type = "probs") + facet_wrap(~group)
plot_comparisons(
mod,
variables = list(mpg = c(15, 30)),
condition = "group",
type = "probs")
10.3 mlogit
package
The mlogit
package uses data
in a slightly different structure, with one row per observation-choice combination. For example, this data on choice of travel mode includes 4 rows per individual, one for each mode of transportation:
library("AER")
library("mlogit")
library("tidyverse")
data("TravelMode", package = "AER")
head(TravelMode)
#> individual mode choice wait vcost travel gcost income size
#> 1 1 air no 69 59 100 70 35 1
#> 2 1 train no 34 31 372 71 35 1
#> 3 1 bus no 35 25 417 70 35 1
#> 4 1 car yes 0 10 180 30 35 1
#> 5 2 air no 64 58 68 68 30 2
#> 6 2 train no 44 31 354 84 30 2
mod <- mlogit(choice ~ wait + gcost | income + size, TravelMode)
avg_slopes(mod, variables = c("income", "size"))
#>
#> Group Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> air income 0.002786 0.00122 2.288 0.02213 5.5 0.000399 0.00517
#> air size -0.126465 0.02892 -4.374 < 0.001 16.3 -0.183139 -0.06979
#> bus income -0.000372 0.00110 -0.338 0.73548 0.4 -0.002531 0.00179
#> bus size 0.011345 0.02587 0.439 0.66099 0.6 -0.039358 0.06205
#> car income 0.003373 0.00137 2.455 0.01408 6.1 0.000680 0.00607
#> car size 0.045880 0.02476 1.853 0.06385 4.0 -0.002642 0.09440
#> train income -0.005787 0.00132 -4.389 < 0.001 16.4 -0.008370 -0.00320
#> train size 0.069240 0.02478 2.794 0.00521 7.6 0.020662 0.11782
#>
#> Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
Note that the slopes()
function will always return estimates of zero for regressors before the vertical bar in the formula. This is because marginaleffects
increments all rows of the prediction dataset in the same way to compute slopes and contrast. Because mlogit
data are in “long” format, this means that alternatives are incremented in the same way, which does not produce alternative-specific changes in the predictors.
One strategy to circumvent this problem is to supply a data frame of numeric values to compare, with alternative specific changes. In this example, we test what happens to the probability of selecting each mode of transportation if we only increase the wait time of air travel:
altspec <- data.frame(
low = TravelMode$wait,
high = ifelse(TravelMode$mode == "air", TravelMode$wait + 15, TravelMode$wait)
)
avg_comparisons(mod, variables = list(wait = altspec))
#>
#> Group Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> air wait manual -0.1321 0.01070 -12.35 <0.001 114.0 -0.1531 -0.1111
#> bus wait manual 0.0251 0.00460 5.45 <0.001 24.2 0.0160 0.0341
#> car wait manual 0.0701 0.00834 8.41 <0.001 54.5 0.0538 0.0865
#> train wait manual 0.0369 0.00528 6.99 <0.001 38.4 0.0266 0.0473
#>
#> Columns: term, group, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
We can compute yet more kinds of marginal effects, we can construct customized data frames and feed them to the newdata
argument of the slopes()
function.
If we want to compute the slope of the response function (marginal effects) when each of the predictors is fixed to its global mean, we can do:
nd <- TravelMode |>
summarize(across(c("wait", "gcost", "income", "size"),
function(x) rep(mean(x), 4)))
nd
#> wait gcost income size
#> 1 34.58929 110.8798 34.54762 1.742857
#> 2 34.58929 110.8798 34.54762 1.742857
#> 3 34.58929 110.8798 34.54762 1.742857
#> 4 34.58929 110.8798 34.54762 1.742857
avg_slopes(mod, newdata = nd, variables = c("income", "size"))
#>
#> Group Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> air income 6.66e-03 2.42e-03 2.75 0.00599 7.4 1.91e-03 1.14e-02
#> air size -1.69e-01 5.88e-02 -2.88 0.00394 8.0 -2.85e-01 -5.42e-02
#> bus income -1.14e-03 9.44e-04 -1.21 0.22657 2.1 -2.99e-03 7.09e-04
#> bus size 4.67e-02 2.72e-02 1.72 0.08624 3.5 -6.66e-03 1.00e-01
#> car income 6.48e-06 2.02e-05 0.32 0.74894 0.4 -3.32e-05 4.62e-05
#> car size 1.36e-03 8.81e-04 1.54 0.12305 3.0 -3.68e-04 3.08e-03
#> train income -5.52e-03 1.91e-03 -2.89 0.00383 8.0 -9.26e-03 -1.78e-03
#> train size 1.21e-01 4.45e-02 2.73 0.00634 7.3 3.42e-02 2.08e-01
#>
#> Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
If we want to compute marginal effects with the gcost
and wait
fixed at their mean value, conditional on the choice of transportation mode:
nd <- TravelMode |>
group_by(mode) |>
summarize(across(c("wait", "gcost", "income", "size"), mean))
nd
#> # A tibble: 4 × 5
#> mode wait gcost income size
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 air 61.0 103. 34.5 1.74
#> 2 train 35.7 130. 34.5 1.74
#> 3 bus 41.7 115. 34.5 1.74
#> 4 car 0 95.4 34.5 1.74
avg_slopes(mod, newdata = nd, variables = c("income", "size"))
#>
#> Group Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> air income 0.006015 0.00233 2.583 0.00979 6.7 0.001451 0.01058
#> air size -0.232927 0.05659 -4.116 < 0.001 14.7 -0.343837 -0.12202
#> bus income -0.000713 0.00146 -0.489 0.62487 0.7 -0.003570 0.00214
#> bus size 0.020440 0.03436 0.595 0.55191 0.9 -0.046901 0.08778
#> car income 0.005445 0.00228 2.384 0.01711 5.9 0.000969 0.00992
#> car size 0.067820 0.04124 1.645 0.10003 3.3 -0.012999 0.14864
#> train income -0.010747 0.00256 -4.202 < 0.001 15.2 -0.015760 -0.00573
#> train size 0.144668 0.04773 3.031 0.00244 8.7 0.051111 0.23822
#>
#> Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
We can also explore more complex alternatives. Here, for example, only one alternative is affected by cost reduction:
nd <- datagrid(mode = TravelMode$mode, newdata = TravelMode)
nd <- lapply(1:4, function(i) mutate(nd, gcost = ifelse(1:4 == i, 30, gcost)))
nd <- bind_rows(nd)
nd
#> individual choice wait vcost travel gcost income size mode
#> 1 1 no 35 48 486 30 35 2 air
#> 2 1 no 35 48 486 111 35 2 train
#> 3 1 no 35 48 486 111 35 2 bus
#> 4 1 no 35 48 486 111 35 2 car
#> 5 1 no 35 48 486 111 35 2 air
#> 6 1 no 35 48 486 30 35 2 train
#> 7 1 no 35 48 486 111 35 2 bus
#> 8 1 no 35 48 486 111 35 2 car
#> 9 1 no 35 48 486 111 35 2 air
#> 10 1 no 35 48 486 111 35 2 train
#> 11 1 no 35 48 486 30 35 2 bus
#> 12 1 no 35 48 486 111 35 2 car
#> 13 1 no 35 48 486 111 35 2 air
#> 14 1 no 35 48 486 111 35 2 train
#> 15 1 no 35 48 486 111 35 2 bus
#> 16 1 no 35 48 486 30 35 2 car
avg_slopes(mod, newdata = nd, variables = c("income", "size"))
#>
#> Group Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> air income 8.24e-03 2.46e-03 3.352 <0.001 10.3 0.003422 0.013057
#> air size -2.12e-01 6.02e-02 -3.526 <0.001 11.2 -0.330512 -0.094367
#> bus income -1.33e-03 1.30e-03 -1.021 0.307 1.7 -0.003877 0.001222
#> bus size 6.06e-02 3.79e-02 1.600 0.110 3.2 -0.013662 0.134911
#> car income 2.66e-05 4.32e-05 0.616 0.538 0.9 -0.000058 0.000111
#> car size 2.38e-03 1.57e-03 1.512 0.131 2.9 -0.000704 0.005459
#> train income -6.94e-03 1.86e-03 -3.735 <0.001 12.4 -0.010580 -0.003297
#> train size 1.49e-01 4.28e-02 3.488 <0.001 11.0 0.065477 0.233397
#>
#> Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
#> Type: response
Important: The newdata
argument for mlogit
models must be a “balanced” data frame, that is, it must have a number of rows that is a multiple of the number of choices.