7  Slopes

A slope measures how an outcome \(Y\) responds to changes in a predictor \(X\), when we hold other covariates at fixed values. It is often the main quantity of interest in an empirical analysis, when a researcher wants to estimate an effect, or the strength of association between two variables. Slopes belong to the same toolbox as the counterfactual comparisons explored in Chapter 6. Both quantities help us answer a counterfactual query: What would happen to a predicted outcome if one of the predictors were slightly different?

One hurdle to understanding slopes is that they are formally defined in the language of calculus. This chapter focuses on intuition first, eschews most technical details, and provides several graphical examples to make abstract ideas more tangible. Nevertheless, readers who are not comfortable with the basics of multivariable calculus may want to jump ahead to the case studies in the next part of this book.

Beyond these technical barriers, another problem that hinders our understanding is that the terminology used when discussing slopes is woefully inconsistent. In some fields, like economics or political science, they are called “marginal effects.” In that context, the word “marginal” refers to the effect of an infinitesimal change in one of the predictors. In other disciplines, a slope may be called “trend,” “velocity,” or “partial effect.” In this chapter, we use the expressions “slopes” and “marginal effects” interchangeably to mean:

Partial derivative of the regression equation with respect to a predictor of interest.

Consider a simple linear model with outcome \(Y\) and predictor \(X\):

\[ Y = \beta_0 + \beta_1 X +\varepsilon \\ \tag{7.1}\]

To find the slope of \(Y\) with respect to \(X\), we take the partial derivative.

\[\frac{\partial Y}{\partial X} = \beta_1\]

In Equation 7.1, the slope is thus equivalent to a single coefficient. This identity explains why some analysts refer to regression coefficients as slopes. It also gives us a clue for interpretation: in simple linear models, a one-unit increase in \(X\) is associated with a \(\frac{\partial Y}{\partial X}\) change in \(Y\), holding other modeled covariates constant. This interpretation is a useful first cut, but it comes with caveats.

First, given that slopes are defined as derivatives, that is, in terms of an infinitesimal change in \(X\), they can only be constructed for continuous numeric predictors. Analysts who are interested in the effect of a change in a categorical predictor should turn to counterfactual comparisons (Chapter 6).1

Second, the partial derivative measures the slope of \(Y\)’s tangent at a specific point in the predictor space. Thus, the interpretation of a slope as the effect of a one-unit change in \(X\) must be understood as an approximation, valid in a small neighborhood of the predictors, for a small change in \(X\). This approximation may not be particularly good when the regression function is non-linear, or when the scale of \(X\) is very small.

Finally, the foregoing discussion implies that slopes are conditional quantities, in the sense that they typically depend on the value of \(X\), but also on the values of all the other predictors in a model. Every row of a dataset or grid has its own slope. In that respect, slopes are like the predictions and counterfactual comparisons from chapters 5 and 6. Thus, the current chapter proceed as before, by answering the five questions of our conceptual framework: (1) Quantity, (2) Grid, (3) Aggregation, (4) Uncertainty, and (5) Test.

7.1 Quantity

As noted above, the slope characterizes the strength and direction of association between two variables. To get an intuitive feel for what this means in practice, it is useful to consider a few graphical examples. In this section, we will see that the slope of a curve tells us how it behaves as we move from left to right along the \(X\)-axis, that is, it tells us whether the curve is increasing, decreasing, or flat. This, in turn, has important implications for how we understand the relationships between variables of interest in a regression model.

7.1.1 Slopes of simple functions

The top row of Figure 7.1 shows three functions of \(X\). The bottom row traces the derivatives of each of those functions. By examining the values of derivatives at any given point, we can determine precisely where the corresponding functions are rising, falling, or flat.

Figure 7.1: Three functions and their derivatives.

First, consider the top left panel of Figure 7.1, which displays the function \(Y=-1+0.5X\), a straight line with a positive slope. As we move from left to right, the line steadily rises. The effect of \(X\) is constant across the full range of \(X\) values. For every one-unit increase in \(X\), \(Y\) increases by 0.5 units. Since the slope does not change, the derivative of this function is also constant (bottom left panel): \(\frac{\partial Y}{\partial X} = 0.5.\) This means that the association between \(X\) and \(Y\) is always positive, and that the strength of association between \(X\) and \(Y\) is of constant magnitude, regardless of the baseline value of \(X\).

Now, consider the top middle panel of Figure 7.1, which displays the function \(Y = X^2\), a symmetric parabola that opens upward. As we move from left to right, the function initially decreases, until it reaches its minimum at \(X=0\). In the left part of the plot, the relationship between \(X\) and \(Y\) is negative: an increase in \(X\) is associated to a decrease in \(Y\). When \(X\) reaches 0, the curve becomes flat: the slope of its tangent line is zero. This means that an infinitesimal change in \(X\) would have (essentially) no effect on \(Y\). As \(X\) continues to increase beyond zero, the function starts increasing, with a positive slope. The derivative of this function captures this behavior. When \(X<0\), the derivative is \(\frac{\partial Y}{\partial X}=2X<0\), which indicates that the relationship between \(X\) and \(Y\) is negative. When \(X=0\), the derivative is \(2X=0\), which means that the association between \(X\) and \(Y\) is null. When the \(X>0\), the derivative is positive, which implies that an increase in \(X\) is associated with an increase in \(Y\). In short, the sign and strength of the relationship between \(X\) and \(Y\) are heterogeneous; they depend on the baseline value of \(X\). This will be an important insight to keep in mind for the case study in Chapter 8, where we explore the role polynomial functions in regression modeling.

The third function in Figure 7.1, \(Y = \cos(X)\), oscillates between positive and negative values, forming a wave-like curve. Focusing on the first section of this curve, we see that \(\cos(X)\) decreases as we move from left to right, reaches a minimum at \(-\pi\), and then increases until \(X=0\). The derivative plotted in the bottom right panel captures this trajectory well. Indeed, whenever \(-\sin(X)\) is negative, we see that \(\cos(X)\) points downward. When \(-\sin(X)=0\), the \(\cos(X)\) curve is flat and reaches a minimum or a maximum. When \(-\sin(X)>0\), the \(\cos(X)\) function points upward.

These three examples show that the derivative of a function precisely characterizes both the strength and direction of the association between a predictor and an outcome. It tells us, at any given point in the predictor space, if increasing \(X\) should result in a decrease, an increase, or in no change in \(Y\).

7.1.2 Slope of a logistic function

Let us now consider a more realistic example, similar to the curves we are likely to fit in an applied regression context.

\[ Pr(Y=1) = \Phi \left (\beta_1 + \beta_2 X \right), \tag{7.2}\]

where \(\Phi\) is the logistic function, \(\beta_1\) is an intercept, \(\beta_2\) a regression coefficient, and \(X\) is a numeric predictor. Imagine that the true parameters are \(\beta_1=-1\) and \(\beta_2=0.5\). We can simulate a dataset with a million observations that follow this data generating process.

library(marginaleffects)
set.seed(48103)
N <- 1e6
X <- rnorm(N, sd = 2)
p <- plogis(-1 + 0.5 * X)
Y <- rbinom(N, 1, p)
dat <- data.frame(Y, X)

Now, we fit a logistic regression model with the glm() function and print the coefficient estimates.

mod <- glm(Y ~ X, data = dat, family = binomial)
b <- coef(mod)
b
(Intercept)           X 
 -0.9981282   0.4993960 

To visualize the outcome function and its derivative, we use the plot_predictions() and plot_slopes() functions. The latter is very similar to the plot_comparisons() function introduced in Section 6.6. The variables argument identifies the predictor with respect to which we are computing the slope, and the condition argument specifies which variable should be displayed on the x-axis. To display plots on top of one another, we use the patchwork package and its / operator.

library(patchwork)
p_function <- plot_predictions(mod, condition = "X")
p_derivative <- plot_slopes(mod, variables = "X", condition = "X")
p_function / p_derivative
Figure 7.2: Logistic function and its derivative.

The key insight from Figure 7.2 is that the slope of the logistic function is far from constant. Increasing \(X\)—moving from left to right on the graph—has a different effect on the estimated probability that \(Y=1\), depending on our position on the horizontal axis. When the initial value of \(X\) is small (left side of the figure), an increase in \(X\) results in a small change in \(Pr(Y=1)\): the prediction curve is relatively flat and the derivative is close to zero. At intermediate values of \(X\) (middle of the figure), a change in \(X\) results in a larger shift in \(Pr(Y=1)\): the curve is steep and the derivative is positive and large. When the initial value of \(X\) is large (right side of the figure), a change in \(X\) does not change \(Pr(Y=1)\) much: the curve is quite flat again.

In sum, the slope characterizes the strength of association between \(X\) and \(Y\) for different baseline values of the predictors. To estimate the slope, we first proceed analytically, using the chain rule of differentiation to compute the derivative of Equation 7.2 with respect to \(X\).

\[ \frac{\partial Pr(Y=1)}{\partial X} = \beta_2 \cdot \phi(\beta_1 + \beta_2 X), \tag{7.3}\]

where \(\phi\) is the logistic density function. \(\phi\) is a relatively complex function, but thankfully, it is implemented in R via the dlogis() function. We can thus plug-in the coefficient estimates that we obtained by fitting our model into Equation 7.3, to obtain an estimate of the slope. Importantly, whenever we evaluate a slope, we must explicitly state the values of the predictors of interest. For instance, the estimated values of \(\frac{\partial Pr(Y=1)}{\partial X}\) for \(X\in \{-5, 0, 10\}\) are:

b[2] * dlogis(b[1] + b[2] * -5)

[1] 0.0142749

b[2] * dlogis(b[1] + b[2] * 0)

[1] 0.09827211

b[2] * dlogis(b[1] + b[2] * 10)

[1] 0.008856198

Alternatively, we could compute the same estimates with the slopes() function from marginaleffects.

slopes(mod, 
       variables = "X",
       newdata = datagrid(X = c(-5, 0, 10)))
Term X Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
X -5 0.01427 7.25e-05 197.0 <0.001 0.01413 0.01442
X 0 0.09827 2.61e-04 376.6 <0.001 0.09776 0.09878
X 10 0.00886 8.99e-05 98.6 <0.001 0.00868 0.00903

slopes() is very similar to the comparisons() function introduced in Chapter 6. The variables argument specifies the focal predictor, and newdata specifies the values of the other predictors at which we want to evaluate the slope. In the rest of this chapter, we explore how to use slopes() and its sister functions, avg_slopes() and plot_slopes(), to estimate and visualize a variety of slope-related quantities of interest.

For illustration, we consider a variation on a model considered in previous chapters, using data from Thornton (2008). As before, the dependent variable is a binary variable called outcome, equal to 1 if a study participant sought to learn their HIV status at a test center, and 0 otherwise. The focal predictor is incentive, a binary variable indicating whether the participant were randomly assigned to the treatment group in which participants received a monetary incentive to go to the clinic, or if they were assigned to a control condition without incentive. The model also includes adjustement (or “control”) variables: distance and its square. Finally, note that we include multiplicative interactions between the incentive variable and the distance variable. As we will see, this poses no problem for the slopes() function whatsoever, and changes nothing to the post-estimation workflow.

dat <- get_dataset("thornton")
mod <- glm(outcome ~ incentive * (distance + I(distance^2)),
    data = dat, family = binomial)
summary(mod)

Coefficients:
                        Estimate Std. Error z value Pr(>|z|)
(Intercept)             -0.04450    0.26238  -0.170    0.865
incentive                1.95742    0.31708   6.173   <0.001
distance                -0.50546    0.25899  -1.952    0.051
I(distance^2)            0.06758    0.05167   1.308    0.191
incentive:distance       0.08132    0.30391   0.268    0.789
incentive:I(distance^2) -0.01655    0.05956  -0.278    0.781

7.2 Predictors

Slopes can help us answer questions such as:

How does the predicted outcome \(\hat{Y}\) change when the focal variable \(X\) increases by a small amount, and the adjustment variables \(Z_1, Z_2, \ldots, Z_n\) are held at specific values?

7.2.1 Focal variable

The focal variable is the predictor of interest. It is the variable whose association with (or effect on) \(Y\) we wish to estimate. In terms of notation, the focal variable is the predictor in the denominator of the partial derivative: \(\frac{\partial Y}{\partial X}\).

As in the comparisons() function, the slopes() function accepts a variables argument, which serves to specify the focal predictor. For example, this code will estimate the partial derivative of the outcome equation in the model object, with respect to the \(X\) predictor:

slopes(model, variables = "X")

7.2.2 Adjustment variables

Slopes are conditional quantities, in the sense that their values will typically depend on the values of all the variables on the right-hand side of a regression equation. Thus, every predictor profile—or combination of predictor values—will be associated with its own slope. In marginaleffects, we specify the values of the adjustment (or control) variables using the newdata argument.

Section 3.2 described different useful “grids” of predictor values, that is, different combinations of unit characteristics that could be of interest. One common type of grid was the “interesting” or “user-specified” grid, which is a combination of predictor values that hold particular scientific or domain-specific interest. For example, imagine that we are specifically interested in measuring the association between distance and outcome, for an individual who is part of the treatment group (incentive is 1), and who lives at a distance of 1 from the test center. We can use the datagrid() helper function to create a data frame with the desired values of the predictors, and then pass this data frame to the newdata argument of the slopes() function. The function will then return a “slope at user-specified values” (aka “marginal effect at interesting values”).

library(marginaleffects)
slopes(mod,
  variables = "distance",
  newdata = datagrid(incentive = 1, distance = 1)
)
Term incentive distance Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
distance 1 1 -0.0468 0.0139 -3.36 <0.001 -0.0741 -0.0195

Alternatively, we can set newdata="mean" to compute a “marginal effect at the mean” or a “marginal effect at representative value.” This is the slope of the regression equation with respect to the focal predictor, for an individual whose characteristics are exactly average (or modal) on all adjustment variable. As noted in previous chapters, computing estimates at the mean is computationally efficient, but it may not be particularly informative, especially when the perfectly average individual is not realistic or substantively interesting.

slopes(mod, variables = "distance", newdata = "mean")
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % incentive
-0.0375 0.00971 -3.86 <0.001 -0.0565 -0.0185 1

Finally, we can compute slopes for every observation in the original sample. These “unit-level marginal effects” or “partial effects” are the default output of the slopes() function.

slopes(mod, variables = "distance")
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
-0.1054 0.05195 -2.029 0.04245 -0.2072 -0.00359
-0.0937 0.04362 -2.149 0.03162 -0.1792 -0.00826
-0.0107 0.02357 -0.452 0.65103 -0.0569 0.03554
-0.0401 0.01429 -2.807 0.00501 -0.0681 -0.01210
-0.0724 0.02881 -2.514 0.01192 -0.1289 -0.01598
-0.0452 0.01327 -3.407 < 0.001 -0.0712 -0.01920
-0.0470 0.01397 -3.365 < 0.001 -0.0744 -0.01962
-0.0452 0.01329 -3.402 < 0.001 -0.0713 -0.01917
-0.0431 0.01232 -3.501 < 0.001 -0.0673 -0.01899
-0.0223 0.00887 -2.514 0.01193 -0.0397 -0.00492

7.3 Aggregation

A dataset with one marginal effect estimate per unit of observation is a bit unwieldy and difficult to interpret. Many analysts like to report the “average marginal effet” (or “average slope”), that is, the average of all the unit-level estimates. The average slope can be obtained in two steps, by computing unit-level estimates and then their average. However, it is more convenient to use the avg_slopes() function.

avg_slopes(mod, variables = "distance")
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
-0.0379 0.0071 -5.35 <0.001 -0.0519 -0.024

Note that there is a nuanced distinction to be drawn between the “marginal effect at the mean” shown in the previous section, and the “average marginal effect” presented here. The former is calculated based on a single individual whose characteristics are exactly average or modal in every dimension. The latter is calculated by examining the slopes for all the observed data points in our dataset and then taking the average of those slopes. These two options will not always be equivalent; they can yield both numerically and substantively different results. In general, the marginal effect at the mean might be useful when there are computational constraints. The average marginal effect will be useful when the dataset adequately represents the distribution of predictors in the population, in which case marginalizing across the distribution can give us a good summary of the slopes at different points.

The following results, for instance, show the average slope (strength of association) between distance and the probability that a study participant will travel to the clinic to learn their test result.

avg_slopes(mod, variables = "distance", by = "incentive")
Term incentive Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
distance 0 -0.0558 0.02004 -2.78 0.00539 -0.0950 -0.0165
distance 1 -0.0329 0.00713 -4.62 < 0.001 -0.0469 -0.0190

These estimates can be called “group-average marginaleffects” or “group-average slopes.” They are equivalent to manually taking the mean of the unit-level marginal effect for each species sub-groupof the original data.

7.4 Uncertainty

As with all other quantities estimated thus far, we can compute estimates of uncertainty around slopes using various strategies. For instance, we may employ robust standard errors, clustered standard errors, or bootstrapping through either the vcov argument or the inferences() function. Here are two simple examples: the first clusters standard errors by village, while the second command applies a simple non-parametric bootstrap.

# clustered standard errors
avg_slopes(mod, variables = "distance", vcov = ~village)
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
-0.0379 0.00921 -4.12 <0.001 -0.056 -0.0199
# bootstrap
avg_slopes(mod, variables = "distance") |> inferences(method = "rsample")
Estimate 2.5 % 97.5 %
-0.0379 -0.0515 -0.0231

7.5 Test

Chapter 4 showed an easy strategy to conduct complex null hypothesis tests on any quantity estimated using the marginaleffects package. This approach is equally applicable to slopes, as it was to predictions and counterfactual comparisons.

Recall the example from above, where we computed the average slopes of outcome with respect to distance, for each incentive subgroup.

avg_slopes(mod, variables = "distance", by = "incentive")
Term incentive Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
distance 0 -0.0558 0.02004 -2.78 0.00539 -0.0950 -0.0165
distance 1 -0.0329 0.00713 -4.62 < 0.001 -0.0469 -0.0190

At first glance, these two estimates appear different. It seems that distance had a more pronounced negative effect on the probability of traveling to the test center in the control condition. Consequently, one might infer that a monetary incentive mitigates the disincentive posed by distance from moving to the center.

To ensure that this observation may not simply be the product of sampling variation, we must formally to determine if the two quantities are statistically distinguishable. To achieve this, we execute the same command as before, but include an additional argument called hypothesis. In that argument, we insert an equation-like string, to test the null hypothesis that the first estimate (b1) is different from the second estimate (b2).

avg_slopes(mod, variables = "distance", by = "incentive",
    hypothesis = "b1 - b2 = 0")
Term Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
b1-b2=0 -0.0228 0.0213 -1.07 0.283 -0.0645 0.0189

The difference between those two estimates is about -0.023, which suggests that the slope is flatter in the treatment group. In other words, the association between distance and outcome is weaker when incentive=1 than when incentive=0. However, this difference is not statistically significant (p={r} h$p.value). Therefore, we cannot reject the null hypothesis that there is no heterogeneity in the effect of distance on the probability of seeking one’s test result.

7.6 Visualization

In interpreting model results, it is useful to distinguish between slopes and predictions. Slopes capture the relationship between independent and dependent variables by showing the rate of change, while predictions estimate the expected outcome for given values of independent variables. In Chapter 5, we saw that the plot_predictions() function visualizes predictions. Now, we will see how the plot_slopes() function can be used to display slopes (aka marginal effects or rates of change).

Figure 7.3 illustrates these concepts in two panels. The top panel shows model-based predictions for the outcome at different values of distance, holding other covariates at their mean or mode. This plot suggests that the probability that an individual will see to learn their HIV status decreases as the distance to the clinic increases.

# Slopes evaluated at every point in the sample
library(ggplot2)
library(patchwork)

p1 <- plot_predictions(mod, condition = "distance")

p2 <- plot_slopes(mod, variables = "distance", condition = "distance") +
    geom_hline(yintercept = 0, linetype = "dotted")

p1 / p2
Figure 7.3: Model-based predictions vs. estimated slopes.

TODO:

  • Substantive interpreation.
  • Examples with continuous outcomes.
  • Reference to the interactions chapter instead of showing all possibilities.

The file appears to contain a chapter on “Slopes” in a statistical or methodological context, likely related to marginal effects and their estimation using R code. The content includes:

  • R code chunks for data manipulation, modeling, and visualization.
  • Conceptual discussions on slopes and marginal effects, possibly in the context of statistical modeling.
  • Visualizations or methods to illustrate slopes and their implications.

I will now create a structured summary in the format you specified based on the file content.

7.7 Summary

This chapter defined a “slope” as the partial derivative of the regression equation with respect to a focal predictor of interest. It is a measure of association between two variables, or of the effect of one variable on another, holding other predictors constant.

The slopes() function from the marginaleffects package computes slopes for a wide range of models. avg_slopes() aggregates slopes across units or groups. plot_slopes() displays slopes visually.

Table 7.1: Main arguments of the slopes(), avg_slopes(), and plot_slopes() functions.
Argument
model Fitted model used to make counterfactual slopes.
variables Focal predictor whose association with or effect on the outcome we are interested in.
newdata Grid of predictor values.
slope Choice between derivative or (semi-)elasticity.
vcov Standard errors: Classical, robust, clustered, etc.
conf_level Size of the confidence intervals.
type Type of predictions to compare: response, link, etc.
by Grouping variable for average predictons.
wts Weights used to compute average slopes.
transform Post-hoc transformations.
hypothesis Compare different slopes to one another, conduct linear or non-linear hypothesis tests, or specify null hypotheses.
equivalence Equivalence tests
df Degrees of freedom for hypothesis tests.
numderiv Algorithm used to compute numerical derivatives.
... Additional arguments are passed to the predict() method supplied by the modelling package.

To clearly define slopes and attendant tests, analysts must make five decisions.

First, the Quantity.

  • A slope is always computed with respect to a focal predictor, whose effect on (or association with) the outcome we wish to estimate. In marginaleffects functions, the focal variable is specified using the variables argument.
  • A slope can roughly be interpreted as the effect of a one-unit change in the focal predictor on the outcome. However, this interpretation is an approximation, valid only in a small neighborhood of the predictors.

Second, the Predictors.

  • Slopes are conditional quantities, meaning that they will typically vary based on the values of all predictors in a model. Every row of a dataset has its own slope.
  • Analysts can make slopes for different combinations of predictor values—or grids: empirical, interesting, representative, balanced, or counterfactual.
  • The predictor grid is defined by the newdata argument and the datagrid() function.

Third, the Aggregation.

  • To simplify the presentation of results, analysts can report average slopes. Different aggregation schemes are available:
    • Unit-level slopes (no aggregation)
    • Average slopes
    • Average slopes by subgroup
    • Weighted average of slopes
  • Slopes can be aggregated using the avg_slopes() function and the by argument.

Fourth, the Uncertainty.

  • In marginaleffects, the vcov argument allows analysts to report classical, robust, or clustered standard errors for slopes.
  • The inferences() function can compute uncertainty intervals via bootstrapping or simulation-based inference.

Fifth, the Test.

  • A null hypothesis test evaluates whether a slope (or a function of slopes) is significantly different from a null value. For example, an analyst may wish to check if the effect of a treatment is stronger in one subgroup of the population or another. Null hypothesis tests are conducted using the hypothesis argument.
  • An equivalence test evaluates whether a slope (or a function of slopes) is similar to a reference value. Equivalence tests are conducted using the equivalence argument.


  1. When the analyst calls the slopes() function, the marginaleffects package will automatically return counterfactual comparisons rather for categorical predictors.↩︎