7 Slopes
A slope measures how an outcome \(Y\) responds to changes in a predictor \(X\), when we hold other covariates at fixed values. It is often the main quantity of interest in an empirical analysis, when a researcher wants to estimate an effect, or the strength of association between two variables. Slopes belong to the same toolbox as the counterfactual comparisons explored in Chapter 6. Both quantities help us answer a counterfactual query: What would happen to a predicted outcome if one of the predictors were slightly different?
Unfortunately, the terminology that researchers use when discussing slopes is woefully inconsistent. In some fields, like economics or political science, they are called “marginal effects.” In that context, the word “marginal” refers to the effect of an infinitesimal change in one of the predictors. In other disciplines, a slope may be called “trend,” “velocity,” or “partial effect.” In this chapter, we use the expressions “slopes” and “marginal effects” interchangeably to mean:
Partial derivative of the regression equation with respect to a predictor of interest.
Consider a simple linear model with outcome \(Y\) and predictor \(X\):
\[ Y = \beta_0 + \beta_1 X +\varepsilon \\ \tag{7.1}\]
To find the slope of \(Y\) with respect to \(X\), we take the partial derivative.
\[\frac{\partial Y}{\partial X} = \beta_1\]
In Equation 7.1, the slope is thus equivalent to a single coefficient. This identity explains why some analysts refer to regression coefficients as slopes. It also gives us a clue for interpretation: in simple linear models, a one-unit increase in \(X\) is associated with a \(\frac{\partial Y}{\partial X}\) change in \(Y\), holding other modeled covariates constant. This interpretation is a useful first cut, but it comes with caveats.
First, given that slopes are defined as derivatives, that is, in terms of an infinitesimal change in \(X\), they can only be constructed for continuous numeric predictors. Analysts who are interested in the effect of a change in a categorical predictor should turn to counterfactual comparisons (Chapter 6).1
Second, the partial derivative measures the slope of \(Y\)’s tangent at a specific point in the predictor space. Thus, the interpretation of a slope as the effect of a one-unit change in \(X\) must be understood as an approximation, valid in a small neighborhood of the predictors, for a small change in \(X\). This approximation may not be particularly good when the regression function is non-linear, or when the scale of \(X\) is very small.
Finally, the foregoing discussion implies that slopes are conditional quantities, in the sense that they typically depend on the value of \(X\), but also on the values of all the other predictors in a model. Every row of a dataset or grid has its own slope. In that respect, slopes are like the predictions and counterfactual comparisons from chapters 5 and 6. Thus, the current chapter proceed as before, by answering the five questions of our conceptual framework: (1) Quantity, (2) Grid, (3) Aggregation, (4) Uncertainty, and (5) Test.
7.1 Quantity
As noted above, the slope characterizes the strength and direction of association between two variables. To get an intuitive feel for what this means in practice, it is useful to consider a few graphical examples. In this section, we will see that the slope of a curve tells us how it behaves as we move from left to right along the \(X\)-axis, that is, it tells us whether the curve is increasing, decreasing, or flat. This, in turn, has important implications for how we understand the relationships between variables of interest in a regression model.
7.1.1 Slopes of simple functions
The top row of Figure 7.1 shows three functions of \(X\). The bottom row traces the derivatives of each of those functions. By examining the values of derivatives at any given point, we can determine precisely where the corresponding functions are rising, falling, or flat.
First, consider the top left panel of Figure 7.1, which displays the function \(Y=-1+0.5X\), a straight line with a positive slope. As we move from left to right, the line steadily rises. The effect of \(X\) is constant across the full range of \(X\) values. For every one-unit increase in \(X\), \(Y\) increases by 0.5 units. Since the slope does not change, the derivative of this function is also constant (bottom left panel): \(\frac{\partial Y}{\partial X} = 0.5.\) This means that the association between \(X\) and \(Y\) is always positive, and that the strength of association between \(X\) and \(Y\) is of constant magnitude, regardless of the baseline value of \(X\).
Now, consider the top middle panel of Figure 7.1, which displays the function \(Y = X^2\), a symmetric parabola that opens upward. As we move from left to right, the function initially decreases, until it reaches its minimum at \(X=0\). In the left part of the plot, the relationship between \(X\) and \(Y\) is negative: an increase in \(X\) is associated to a decrease in \(Y\). When \(X\) reaches 0, the curve becomes flat: the slope of its tangent line is zero. This means that an infinitesimal change in \(X\) would have (essentially) no effect on \(Y\). As \(X\) continues to increase beyond zero, the function starts increasing, with a positive slope. The derivative of this function captures this behavior. When \(X<0\), the derivative is \(\frac{\partial Y}{\partial X}=2X<0\), which indicates that the relationship between \(X\) and \(Y\) is negative. When \(X=0\), the derivative is \(2X=0\), which means that the association between \(X\) and \(Y\) is null. When the \(X>0\), the derivative is positive, which implies that an increase in \(X\) is associated with an increase in \(Y\). In short, the sign and strength of the relationship between \(X\) and \(Y\) are heterogeneous; they depend on the baseline value of \(X\). This will be an important insight to keep in mind for the case study in Chapter 8, where we explore the role polynomial functions in regression modeling.
The third function in Figure 7.1, \(Y = \cos(X)\), oscillates between positive and negative values, forming a wave-like curve. Focusing on the first section of this curve, we see that \(\cos(X)\) decreases as we move from left to right, reaches a minimum at \(-\pi\), and then increases until \(X=0\). The derivative plotted in the bottom right panel captures this trajectory well. Indeed, whenever \(-\sin(X)\) is negative, we see that \(\cos(X)\) points downward. When \(-\sin(X)=0\), the \(\cos(X)\) curve is flat and reaches a minimum or a maximum. When \(-\sin(X)>0\), the \(\cos(X)\) function points upward.
These three examples show that the derivative of a function precisely characterizes both the strength and direction of the association between a predictor and an outcome. It tells us, at any given point in the predictor space, if increasing \(X\) should result in a decrease, an increase, or in no change in \(Y\).
7.1.2 Slope of a logistic function
Let us now consider a more realistic example, similar to the curves we are likely to fit in an applied regression context.
\[ Pr(Y=1) = \Phi \left (\beta_1 + \beta_2 X \right), \tag{7.2}\]
where \(\Phi\) is the logistic function, \(\beta_1\) is an intercept, \(\beta_2\) a regression coefficient, and \(X\) is a numeric predictor. Imagine that the true parameters are \(\beta_1=-1\) and \(\beta_2=0.5\). We can simulate a dataset with a million observations that follow this data generating process.
library(marginaleffects)
set.seed(48103)
N <- 1e6
X <- rnorm(N, sd = 2)
p <- plogis(-1 + 0.5 * X)
Y <- rbinom(N, 1, p)
dat <- data.frame(Y, X)
Now, we fit a logistic regression model with the glm()
function and print the coefficient estimates.
(Intercept) X
-0.9981282 0.4993960
To visualize the outcome function and its derivative, we use the plot_predictions()
and plot_slopes()
functions. The latter is very similar to the plot_comparisons()
function introduced in Section 6.6. The variables
argument identifies the predictor with respect to which we are computing the slope, and the condition
argument specifies which variable should be displayed on the x-axis. To display plots on top of one another, we use the patchwork
package and its /
operator.
library(patchwork)
p_function <- plot_predictions(mod, condition = "X")
p_derivative <- plot_slopes(mod, variables = "X", condition = "X")
p_function / p_derivative
The key insight from Figure 7.2 is that the slope of the logistic function is far from constant. Increasing \(X\)—moving from left to right on the graph—has a different effect on the estimated probability that \(Y=1\), depending on our position on the horizontal axis. When the initial value of \(X\) is small (left side of the figure), an increase in \(X\) results in a small change in \(Pr(Y=1)\): the prediction curve is relatively flat and the derivative is close to zero. At intermediate values of \(X\) (middle of the figure), a change in \(X\) results in a larger shift in \(Pr(Y=1)\): the curve is steep and the derivative is positive and large. When the initial value of \(X\) is large (right side of the figure), a change in \(X\) does not change \(Pr(Y=1)\) much: the curve is quite flat again.
In sum, the slope characterizes the strength of association between \(X\) and \(Y\) for different baseline values of the predictors. To estimate the slope, we first proceed analytically, using the chain rule of differentiation to compute the derivative of Equation 7.2 with respect to \(X\).
\[ \frac{\partial Pr(Y=1)}{\partial X} = \beta_2 \cdot \phi(\beta_1 + \beta_2 X), \tag{7.3}\]
where \(\phi\) is the logistic density function. \(\phi\) is a relatively complex function, but thankfully, it is implemented in R
via the dlogis()
function. We can thus plug-in the coefficient estimates that we obtained by fitting our model into Equation 7.3, to obtain an estimate of the slope. Importantly, whenever we evaluate a slope, we must explicitly state the values of the predictors of interest. For instance, the estimated values of \(\frac{\partial Pr(Y=1)}{\partial X}\) for \(X\in \{-5, 0, 10\}\) are:
b[2] * dlogis(b[1] + b[2] * -5)
[1] 0.0142749
b[2] * dlogis(b[1] + b[2] * 0)
[1] 0.09827211
b[2] * dlogis(b[1] + b[2] * 10)
[1] 0.008856198
Alternatively, we could compute the same estimates with the slopes()
function from marginaleffects
.
Term | X | Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|---|---|
X | -5 | 0.01427 | 7.25e-05 | 197.0 | <0.001 | 0.01413 | 0.01442 |
X | 0 | 0.09827 | 2.61e-04 | 376.6 | <0.001 | 0.09776 | 0.09878 |
X | 10 | 0.00886 | 8.99e-05 | 98.6 | <0.001 | 0.00868 | 0.00903 |
slopes()
is very similar to the comparisons()
function introduced in Chapter 6. The variables
argument specifies the focal predictor, and newdata
specifies the values of the other predictors at which we want to evaluate the slope. In the rest of this chapter, we explore how to use slopes()
and its sister functions, avg_slopes()
and plot_slopes()
, to estimate and visualize a variety of slope-related quantities of interest.
For illustration, we consider a variation on a model considered in previous chapters, using data from Thornton (2008). As before, the dependent variable is a binary variable called outcome
, equal to 1 if a study participant sought to learn their HIV status at a test center, and 0 otherwise. The focal predictor is incentive
, a binary variable indicating whether the participant were randomly assigned to the treatment group in which participants received a monetary incentive to go to the clinic, or if they were assigned to a control condition without incentive. The model also includes adjustement (or “control”) variables: distance
and its square. Finally, note that we include multiplicative interactions between the incentive
variable and the distance
variable. As we will see, this poses no problem for the slopes()
function whatsoever, and changes nothing to the post-estimation workflow.
dat <- readRDS("data/hiv.rds")
mod <- glm(outcome ~ incentive * (distance + I(distance^2)),
data = dat, family = binomial)
summary(mod)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.04450 0.26238 -0.170 0.865
incentive 1.95742 0.31708 6.173 <0.001
distance -0.50546 0.25899 -1.952 0.051
I(distance^2) 0.06758 0.05167 1.308 0.191
incentive:distance 0.08132 0.30391 0.268 0.789
incentive:I(distance^2) -0.01655 0.05956 -0.278 0.781
7.2 Predictors
Slopes can help us answer questions such as:
How does the predicted outcome \(\hat{Y}\) change when the focal variable \(X\) increases by a small amount, and the adjustment variables \(Z_1, Z_2, \ldots, Z_n\) are held at specific values?
7.2.1 Focal variable
The focal variable is the predictor of interest. It is the variable whose association with (or effect on) \(Y\) we wish to estimate. In terms of notation, the focal variable is the predictor in the denominator of the partial derivative: \(\frac{\partial Y}{\partial X}\).
As in the comparisons()
function, the slopes()
function accepts a variables
argument, which serves to specify the focal predictor. For example, this code will estimate the partial derivative of the outcome equation in the model
object, with respect to the \(X\) predictor:
slopes(model, variables = "X")
7.2.2 Adjustment variables
Slopes are conditional quantities, in the sense that their values will typically depend on the values of all the variables on the right-hand side of a regression equation. Thus, every predictor profile—or combination of predictor values—will be associated with its own slope. In marginaleffects
, we specify the values of the adjustment (or control) variables using the newdata
argument.
Section 3.2 described different useful “grids” of predictor values, that is, different combinations of unit characteristics that could be of interest. One common type of grid was the “interesting” or “user-specified” grid, which is a combination of predictor values that hold particular scientific or domain-specific interest. For example, imagine that we are specifically interested in measuring the association between distance
and outcome
, for an individual who is part of the treatment group (incentive
is 1), and who lives at a distance of 1 from the test center. We can use the datagrid()
helper function to create a data frame with the desired values of the predictors, and then pass this data frame to the newdata
argument of the slopes()
function. The function will then return a “slope at user-specified values” (aka “marginal effect at interesting values”).
library(marginaleffects)
slopes(mod,
variables = "distance",
newdata = datagrid(incentive = 1, distance = 1)
)
Term | incentive | distance | Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|---|---|---|
distance | 1 | 1 | -0.0468 | 0.0139 | -3.36 | <0.001 | -0.0741 | -0.0195 |
Alternatively, we can set newdata="mean"
to compute a “marginal effect at the mean” or a “marginal effect at representative value.” This is the slope of the regression equation with respect to the focal predictor, for an individual whose characteristics are exactly average (or modal) on all adjustment variable. As noted in previous chapters, computing estimates at the mean is computationally efficient, but it may not be particularly informative, especially when the perfectly average individual is not realistic or substantively interesting.
slopes(mod, variables = "distance", newdata = "mean")
Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % | incentive |
---|---|---|---|---|---|---|
-0.0375 | 0.00971 | -3.86 | <0.001 | -0.0565 | -0.0185 | 1 |
Finally, we can compute slopes for every observation in the original sample. These “unit-level marginal effects” or “partial effects” are the default output of the slopes()
function.
slopes(mod, variables = "distance")
Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|
-0.1054 | 0.05195 | -2.029 | 0.04245 | -0.2072 | -0.00359 |
-0.0937 | 0.04362 | -2.149 | 0.03162 | -0.1792 | -0.00826 |
-0.0107 | 0.02357 | -0.452 | 0.65103 | -0.0569 | 0.03554 |
-0.0401 | 0.01429 | -2.807 | 0.00501 | -0.0681 | -0.01210 |
-0.0724 | 0.02881 | -2.514 | 0.01192 | -0.1289 | -0.01598 |
-0.0452 | 0.01327 | -3.407 | < 0.001 | -0.0712 | -0.01920 |
-0.0470 | 0.01397 | -3.365 | < 0.001 | -0.0744 | -0.01962 |
-0.0452 | 0.01329 | -3.402 | < 0.001 | -0.0713 | -0.01917 |
-0.0431 | 0.01232 | -3.501 | < 0.001 | -0.0673 | -0.01899 |
-0.0223 | 0.00887 | -2.514 | 0.01193 | -0.0397 | -0.00492 |
7.3 Aggregation
A dataset with one marginal effect estimate per unit of observation is a bit unwieldy and difficult to interpret. Many analysts like to report the “average marginal effet” (or “average slope”), that is, the average of all the unit-level estimates. The average slope can be obtained in two steps, by computing unit-level estimates and then their average. However, it is more convenient to use the avg_slopes()
function.
avg_slopes(mod, variables = "distance")
Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|
-0.0379 | 0.0071 | -5.35 | <0.001 | -0.0519 | -0.024 |
Note that there is a nuanced distinction to be drawn between the “marginal effect at the mean” shown in the previous section, and the “average marginal effect” presented here. The former is calculated based on a single individual whose characteristics are exactly average or modal in every dimension. The latter is calculated by examining the slopes for all the observed data points in our dataset and then taking the average of those slopes. These two options will not always be equivalent; they can yield both numerically and substantively different results. In general, the marginal effect at the mean might be useful when there are computational constraints. The average marginal effect will be useful when the dataset adequately represents the distribution of predictors in the population, in which case marginalizing across the distribution can give us a good summary of the slopes at different points.
The following results, for instance, show the average slope (strength of association) between distance
and the probability that a study participant will travel to the clinic to learn their test result.
avg_slopes(mod, variables = "distance", by = "incentive")
Term | incentive | Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|---|---|
distance | 0 | -0.0558 | 0.02004 | -2.78 | 0.00539 | -0.0950 | -0.0165 |
distance | 1 | -0.0329 | 0.00713 | -4.62 | < 0.001 | -0.0469 | -0.0190 |
These estimates can be called “group-average marginaleffects” or “group-average slopes.” They are equivalent to manually taking the mean of the unit-level marginal effect for each species sub-groupof the original data.
7.4 Uncertainty
As with all other quantities estimated thus far, we can compute estimates of uncertainty around slopes using various strategies. For instance, we may employ robust standard errors, clustered standard errors, or bootstrapping through either the vcov
argument or the inferences()
function. Here are two simple examples: the first clusters standard errors by village, while the second command applies a simple non-parametric bootstrap.
# clustered standard errors
avg_slopes(mod, variables = "distance", vcov = ~village)
Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|
-0.0379 | 0.00921 | -4.12 | <0.001 | -0.056 | -0.0199 |
# bootstrap
avg_slopes(mod, variables = "distance") |> inferences(method = "rsample")
Estimate | 2.5 % | 97.5 % |
---|---|---|
-0.0379 | -0.0515 | -0.0231 |
7.5 Test
Chapter 4 showed an easy strategy to conduct complex null hypothesis tests on any quantity estimated using the marginaleffects
package. This approach is equally applicable to slopes, as it was to predictions and counterfactual comparisons.
Recall the example from ?tbl-slopes_by_incentive, where we computed the average slopes of outcome
with respect to distance
, for each incentive
subgroup.
avg_slopes(mod, variables = "distance", by = "incentive")
Term | incentive | Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|---|---|
distance | 0 | -0.0558 | 0.02004 | -2.78 | 0.00539 | -0.0950 | -0.0165 |
distance | 1 | -0.0329 | 0.00713 | -4.62 | < 0.001 | -0.0469 | -0.0190 |
At first glance, these two estimates appear different. It seems that distance had a more pronounced negative effect on the probability of traveling to the test center in the control condition. Consequently, one might infer that a monetary incentive mitigates the disincentive posed by distance from moving to the center.
To ensure that this observation may not simply be the product of sampling variation, we must formally to determine if the two quantities are statistically distinguishable. To achieve this, we execute the same command as before, but include an additional argument called hypothesis
. In that argument, we insert an equation-like string, to test the null hypothesis that the first estimate (b1
) is different from the second estimate (b2
).
avg_slopes(mod, variables = "distance", by = "incentive",
hypothesis = "b1 - b2 = 0")
Warning:
It is essential to check the order of estimates when specifying hypothesis tests using positional indices like b1, b2, etc. The indices of estimates can change depending on the order of rows in the original dataset, user-supplied arguments, model-fitting package, and version of `marginaleffects`.
It is also good practice to use assertions that ensure the order of estimates is consistent across different runs of the same code. Example:
```r
mod <- lm(mpg ~ am * carb, data = mtcars)
# assertion for safety
p <- avg_predictions(mod, by = 'carb')
stopifnot(p$carb[1] != 1 || p$carb[2] != 2)
# hypothesis test
avg_predictions(mod, by = 'carb', hypothesis = 'b1 - b2 = 0')
```
Disable this warning with: `options(marginaleffects_safe = FALSE)`
This warning appears once per session.
Term | Estimate | Std. Error | z | Pr(>|z|) | 2.5 % | 97.5 % |
---|---|---|---|---|---|---|
b1-b2=0 | -0.0228 | 0.0213 | -1.07 | 0.283 | -0.0645 | 0.0189 |
The difference between those two estimates is about -0.023, which suggests that the slope is flatter in the treatment group. In other words, the association between distance
and outcome
is weaker when incentive=1
than when incentive=0
. However, this difference is not statistically significant (p={r} h$p.value). Therefore, we cannot reject the null hypothesis that there is no heterogeneity in the effect of distance on the probability of seeking one’s test result.
7.6 Visualization
In interpreting model results, it is useful to distinguish between slopes and predictions. Slopes capture the relationship between independent and dependent variables by showing the rate of change, while predictions estimate the expected outcome for given values of independent variables. In Chapter 5, we saw that the plot_predictions()
function visualizes predictions. Now, we will see how the plot_slopes()
function can be used to display slopes (aka marginal effects or rates of change).
Figure 7.3 illustrates these concepts in two panels. The top panel shows model-based predictions for the outcome
at different values of distance
, holding other covariates at their mean or mode. This plot suggests that the probability that an individual will see to learn their HIV status decreases as the distance to the clinic increases.
# Slopes evaluated at every point in the sample
library(ggplot2)
library(patchwork)
p1 <- plot_predictions(mod, condition = "distance")
p2 <- plot_slopes(mod, variables = "distance", condition = "distance") +
geom_hline(yintercept = 0, linetype = "dotted")
p1 / p2
TODO:
- Substantive interpreation.
- Examples with continuous outcomes.
- Reference to the interactions chapter instead of showing all possibilities.