Our world is complex. To make sense of it, data analysts routinely fit sophisticated statistical or machine learning models. Interpreting the results produced by such models can be challenging, and researchers often struggle to communicate their findings to colleagues and stakeholders. Model to Meaning is a book designed to bridge that gap. It is a practical guide for anyone who needs to translate model outputs into accurate insights that are accessible to a wide audience.
Model to Meaning introduces a conceptual framework to help you describe the statistical quantities that can shed light on your research questions, use models to estimate those quantities, and communicate the results clearly and rigorously. Based on this conceptual framework, the book proposes an analysis workflow which can be applied in consistent fashion to (almost) any model you need to fit.
The Model to Meaning project was conceived to empower a broad range of people—including data scientists, researchers, and students—who want to improve their ability to interpret and communicate the results produced by statistical or machine learning models. It is a book for the novice who seeks new practical skills and understanding; but also for the seasoned researcher who is ready to unlearn some old patterns, and embrace new tools that can improve their productivity and impact.
Part I of the book lays the groundwork by encouraging analysts to clearly define their goals, and by introducing a simple conceptual framework to guide model interpretation. The key idea that underpins this framework is that we can often transform the raw parameter estimates obtained by fitting a model into quantities that are much easier to interpret. Converting results to a scale that feels natural to our audience can improve transparency, communication, and impact.
Part II explains how the conceptual framework can be operationalized through quantities of interest and tests, using concrete examples and real-world datasets. It describes three broad classes of quantities of interest—predictions, counterfactual comparisons, and slopes—shows how to estimate them, and explains how to design appropriate hypothesis tests to answer our research questions.
Part III of the book presents detailed case studies to demonstrate how a consistent workflow can be applied in model-agnostic fashion to various settings: causal inference; experiments; interactions and polynomials; mixed effects models; machine learning; and more. These case studies do not exhaust the range of contexts where marginaleffects can play an integral role. The website that accompanies this book, marginaleffects.com, includes over 40 free chapters with detailed tutorials and notebooks.
The level of technical sophistication required to follow the presentation is modest. Readers familiar with concepts like logistic regression and \(p\) values should feel comfortable with most of the material. Some of the case studies in Part III cover more advanced modelling approaches, and extra reading materials are cited when appropriate.
Throughout, explanations are accompanied by detailed code examples in R, with Python translations collected in 41 Python. The main software library used to interpret models—marginaleffects—is free, open source, and well documented (Arel-Bundock, Greifer, and Heiss 2024). Readers who are not yet familiar with basic data manipulation commands in R or Python may want to consult an additional reference, such as Telling Stories with Data(Alexander 2023), R for Data Science(Wickham, Çetinkaya-Rundel, and Grolemund 2023), or Python for Data Analysis(McKinney 2022).
Writing this book would not have been possible without the help and feedback of many marginaleffects users, readers, and contributors. I warmly thank Arthur Albuquerque, Rohan Alexander, Marco Mendoza Aviña, Etienne Bacher, Tyson Barrett, Daniel K. Berry, Mattan S. Ben-Shachar, Nicholas J Clark, Mark Clements, Simon P. Couch, Sam Crawley, Maël Coursonnais, Marcio Augusto Diniz, Michael Donnelly, Brett Gall, Isabella R. Ghement, Nadjim Fréchet, Alexander Fischer, Stefan Hansen, Alex Hayes, Karl Ove Hufthammer, Philippe Joly, Adrien Lamarche, Florence Laflamme, Daniel Lüdecke, Grant McDermott, Artiom Matvei, A. Jordan Nafa, Reiko Okamoto, Demetri Pananos, Julia M. Rohrer, Resul Umit, Roel Verbelen, Matt Warkentin, Brenton Wiernik, Stephen Wild, Aaron Zipp. I also thank the Université de Montréal for funding part of the development of marginaleffects.
Parts of this text were adapted from an article by Arel-Bundock, Greifer, and Heiss (2024) published in the Journal of Statistical Software.1 I thank my co-authors Noah Greifer and Andrew Heiss for their contributions to that article, the marginaleffects documentation, and code. I acknowledge the use of large language models as writing and coding aids.
The key idea that underpins this book is that raw parameter estimates obtained by fitting a model can often be transformed into more interpretable quantities. Presenting results in a way that resonates with the audience enhances clarity, communication, and impact.
Unfortunately, computing intuitive statistical quantities, along with their standard errors, can be a tedious and error-prone process. Furthermore, whereas many excellent software packages exist to fit statistical and machine learning models, these packages often behave in idiosyncratic ways. They often produce objects with incompatible structures, content, and behavior, which makes it difficult for analysts to maintain a consistent workflow across projects.
Software
To address this challenge, this book introduces the free and open source software package marginaleffects, which provides a single point of entry for interpreting results from over 100 different model classes in R and Python. This package simplifies the interpretation process by offering a consistent and powerful user interface, reducing the need for customized code, and minimizing the risk of errors.
Table 1 lists the main functions of the marginaleffects package. These functions allow analysts to compute a wide range of quantities, grouped into three categories: predictions(), comparisons(), and slopes().
predictions: This family of functions computes and plots predictions on different scales, at different levels of aggregation (4 Predictions).
comparisons: This family of functions computes and plots counterfactual comparisons which can caracterize the relationships between two or more variables (5 Counterfactual comparisons). This broad class of estimands includes contrasts, differences, risk ratios, odds ratios, lift, or even user-defined functions.
slopes: This family of functions computes and plots partial derivatives of the outcome equation, commonly called “marginal effects” in econometrics or “trends” in other disciplines.
Table 1: Main functions of the marginaleffects package.
Because computing average predictions, comparisons, and slopes is common practice, the marginaleffects package offers shortcut functions: avg_predictions(), avg_comparisons(), and avg_slopes(). These functions are simple wrappers around the main workhorse functions, returning averages over the whole dataset or by subgroup. Their purpose is to save keystrokes and improve code readability.2
The marginaleffects package includes many more powerful utilities. The hypotheses() function and hypothesis argument allows analysts to conduct linear and non-linear hypothesis tests on parameter estimates, or on any of the other quantities produced by the package. This makes it easy to make cross-group comparisons, compare different effect sizes, and more. The datagrid() function is a convenient function to to create grids of predictor values; inferences() implements alternative inferential strategies like the bootstrap; and get_draws() makes it easy to extract draws from posterior distributions in Bayesian analyses.
The functions in marginaleffects greatly simplify the analysis of randomized experiments and play a central role in analyzing observational data, such as in matching, inverse probability weighting, G-computation, multi-level regression with post-stratification (MRP), conjoint experiments, and multiple imputation for missing data.
All these features are consolidated into a single software package, available in two programming languages, and compatible with over 100 types of models—more than any comparable package. Supported models include linear, generalized linear (GLM), generalized additive (GAM), mixed-effects, fixed-effects, Bayesian models, and more.
Documentation
The marginaleffects package includes extensive documentation that is accessible in manual pages and online. In an R session, you can access the manual pages for any function using the standard help syntax.
?predictions?comparisons?slopes
In Python, documentation is available through the built-in help system.
help(predictions)help(comparisonshelp(slopes)
Comprehensive documentation is also hosted on the package website marginaleffects.com, which includes detailed function references, tutorials, and examples for both R and Python users. This website is regularly updated with new features and use cases.
Data
All datasets used in this book can be accessed using the get_dataset() function from the marginaleffects package. This function can download datasets from two sources: the marginaleffects archive, which contains datasets specifically curated for this book, and the Rdatasets archive, a collection of over 2300 datasets that are commonly used in R packages and statistical education.
For example, to download and read data about the Titanic that was originally distributed by the Stat2Data package, we call get_dataset().
Alexander, Rohan. 2023. Telling Stories with Data: With Applications in r. Chapman; Hall/CRC.
Arel-Bundock, Vincent, Noah Greifer, and Andrew Heiss. 2024. “How to Interpret Statistical Models Using marginaleffects for R and Python.”Journal of Statistical Software 111 (9): 1–32. https://doi.org/10.18637/jss.v111.i09.
McKinney, Wes. 2022. Python for Data Analysis. " O’Reilly Media, Inc.".
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd ed. Sebastopol, CA: O’Reilly Media. https://www.amazon.ca/dp/1492097403.