2  Tools and data

The key idea that underpins this book is that raw parameter estimates obtained by fitting a model can often be transformed into more interpretable quantities. Presenting results in a way that resonates with the audience enhances clarity, communication, and impact.

Unfortunately, computing intuitive statistical quantities, along with their standard errors, can be a tedious and error-prone process. Furthermore, whereas many excellent software packages exist to fit statistical and machine learning models, these packages often behave in idiosyncratic ways. They often produce objects with incompatible structures, content, and behavior, which makes it difficult for analysts to maintain a consistent workflow across projects.

2.1 marginaleffects for R and Python

To address this challenge, this book introduces the free and open source software package marginaleffects, which provides a single point of entry for interpreting results from over 100 different model classes in R and Python. This package simplifies the interpretation process by offering a consistent and powerful user interface, reducing the need for customized code, and minimizing the risk of errors.

Table 2.1 lists the main functions of the marginaleffects package. These functions allow analysts to compute a wide range of quantities, grouped into three categories: predictions(), comparisons(), and slopes().

  1. predictions: This family of functions computes and plots predictions on different scales, at different levels of aggregation (Chapter 5).
  2. comparisons: This family of functions computes and plots counterfactual comparisons which can caracterize the relationships between two or more variables (Chapter 6). This broad class of estimands includes contrasts, differences, risk ratios, odds ratios, lift, or even user-defined functions.
  3. slopes: This family of functions computes and plots partial derivatives of the outcome equation, commonly called “marginal effects” in econometrics or “trends” in other disciplines.
Table 2.1: Main functions of the marginaleffects package.
Goal Function
Predictions predictions()
avg_predictions()
plot_predictions()
Comparisons comparisons()
avg_comparisons()
plot_comparisons()
Slopes slopes()
avg_slopes()
plot_slopes()
Grids datagrid()
Hypotheses and Equivalence hypotheses()
Bayes, Bootstrap, Simulation posterior_draws()
inferences()

Because computing average predictions, comparisons, and slopes is common practice, the marginaleffects package offers shortcut functions: avg_predictions(), avg_comparisons(), and avg_slopes(). These functions are simple wrappers around the main workhorse functions, returning averages over the whole dataset or by subgroup. Their purpose is to save keystrokes and improve code readability.1

The marginaleffects package includes many more powerful utilities. The hypotheses() function and hypothesis argument allows analysts to conduct linear and non-linear hypothesis tests on parameter estimates, or on any of the other quantities produced by the package. This makes it easy to make cross-group comparisons, compare different effect sizes, and more. The datagrid() function is a convenient function to to create grids of predictor values; inferences() implements alternative inferential strategies like the bootstrap; and posterior_draws() makes it easy to extract draws from posterior distributions in Bayesian analyses.

The functions in marginaleffects greatly simplify the analysis of randomized experiments and play a central role in analyzing observational data, such as in matching, inverse probability weighting, G-computation, multi-level regression with post-stratification (MRP), conjoint experiments, and multiple imputation for missing data.

All these features are consolidated into a single software package, available in two programming languages, and compatible with over 100 types of models—more than any comparable package. Supported models include linear, generalized linear (GLM), generalized additive (GAM), mixed-effects, fixed-effects, Bayesian models, and more.

2.2 Data

TODO

The datasets used in the Model to Meaning book can be downloaded here:

https://marginaleffects.com/assets/model_to_meaning_data.zip


  1. For instance, avg_predictions(fit) and predictions(fit, by=TRUE) yield identical results, but the former is more concise and clear.↩︎