1 Who is this book for?

1.1 The Big Picture

Our world is complex. To make sense of it, data analysts routinely fit sophisticated statistical or machine learning models. Interpreting the results produced by such models can be challenging, and researchers often struggle to communicate their findings to colleagues and stakeholders. Model to Meaning is a book designed to bridge that gap. It is a practical guide for anyone who needs to translate model outputs into accurate insights that are accessible to a wide audience.

Model to Meaning introduces a conceptual framework to help you describe the statistical quantities that can shed light on your research questions, use models to estimate those quantities, and communicate the results clearly and rigorously. Based on this conceptual framework, the book proposes an analysis workflow which can be applied in consistent fashion to (almost) any model you need to fit.

The Model to Meaning project was conceived to empower a broad range of people—including data scientists, researchers, and students—who want to improve their ability to interpret and communicate the results produced by statistical or machine learning models. It is a book for the novice who seeks new practical skills and understanding; but also for the seasoned researcher who is ready to unlearn some old patterns, and embrace new tools that can improve their productivity and impact.

Part I of the book lays the groundwork by encouraging analysts to clearly define their goals, and by introducing a simple conceptual framework to guide model interpretation. The key idea that underpins this framework is that we can often transform the raw parameter estimates obtained by fitting a model into quantities that are much easier to interpret. Converting results to a scale that feels natural to our audience can improve transparency and communication.

Part II explains how the conceptual framework can be operationalized through quantities of interest and tests, using concrete examples and real-world datasets. It describes three broad classes of quantities of interest—predictions, counterfactual comparisons, and slopes—shows how to estimate them, and explains how to design appropriate hypothesis tests to answer our research questions.

Part III of the book presents detailed case studies to demonstrate how a consistent workflow can be applied in model-agnostic fashion to various settings: causal inference; experiments; interactions and polynomials; mixed effects models; weighting; categorical outcomes; machine learning; and more. These case studies do not exhaust the range of contexts where the tools and ideas in this book can play an integral role. The website that accompanies this book includes over 30 free chapters with detailed tutorials and notebooks.¹

The level of technical sophistication required to follow the presentation is modest. Readers familiar with concepts like logistic regression and \(p\) values should feel comfortable with most of the material. Some of the case studies in Part III cover more advanced modelling approaches, and extra reading materials are cited when appropriate.

Throughout, explanations are accompanied by detailed code examples in R, with Python translations collected in Appendix II. Readers who are not yet familiar with basic data manipulation commands in R or Python may want to consult an additional reference, such as Telling Stories with Data (Alexander 2023), R for Data Science (Wickham, Çetinkaya-Rundel, and Grolemund 2023), or Python for Data Analysis (McKinney 2022).

Parts of this text were adapted from an article by Arel-Bundock, Greifer, and Heiss (2024) published in the Journal of Statistical Software.² I thank my co-authors Noah Greifer and Andrew Heiss for their contributions to that article, the marginaleffects software documentation, and code. I acknowledge the use of large language models as writing and coding aids, and thank the Université de Montréal for funding some software development.

Writing this book would not have been possible without the help and feedback of many friends, marginaleffects users, readers, and contributors. I warmly thank Arthur Albuquerque, Rohan Alexander, Marco Mendoza Aviña, Etienne Bacher, Tyson Barrett, Daniel K Berry, Mattan S Ben-Shachar, Timothy Chisamore, Nicholas J Clark, Mark Clements, Simon P Couch, Sam Crawley, Maël Coursonnais, Marcio Augusto Diniz, Michael Donnelly, Brett Gall, Isabella R Ghement, Nadjim Fréchet, Alexander Fischer, Stefan Hansen, Alex Hayes, Karl Ove Hufthammer, Philippe Joly, Adrien Lamarche, Florence Laflamme, Daniel Lüdecke, Grant McDermott, Artiom Matvei, A Jordan Nafa, Reiko Okamoto, Demetri Pananos, Julia M Rohrer, Resul Umit, Roel Verbelen, Matt Warkentin, Johannes Weytjens, Brenton Wiernik, Stephen Wild, Aaron Zipp. I thank Maria G and Dante G for the art.

Merci à Sari, Mailis et Béa, les plus meilleures du monde.

1.2 Software

The key idea that underpins this book is that the raw parameter estimates obtained by fitting a model can often be transformed into more interpretable quantities. Presenting results in a way that resonates with the audience enhances clarity, communication, and impact.

Unfortunately, computing intuitive statistical quantities, along with standard errors, can be a tedious and error-prone process. Furthermore, whereas many excellent packages exist to fit models, software often behaves in idiosyncratic ways, producing objects with incompatible structures or inconsistent behavior. This makes it difficult for analysts to maintain a consistent workflow across projects.

To address this challenge, this book introduces a free and open source software package—marginaleffects—which provides a single point of entry to interpret results from over 100 classes of models in R and Python. This package simplifies the interpretation process by offering a consistent and powerful user interface, reducing the need for customized code, and minimizing the risk of errors.

Table 1.1 lists the main functions of the marginaleffects package. These functions allow analysts to compute a wide range of quantities, grouped into three categories: predictions(), comparisons(), and slopes().

predictions (Chapter 5): This family of functions computes and plots predictions on different scales, at different levels of aggregation.
comparisons (Chapter 6): This family of functions computes and plots counterfactual comparisons which can characterize the relationships between two or more variables. This broad class of estimands includes contrasts, differences, risk ratios, odds ratios, lift, etc.
slopes (Chapter 7): This family of functions computes and plots partial derivatives of the outcome equation, commonly called “marginal effects” in econometrics or “trends” in other disciplines.

Table 1.1: Main functions of the marginaleffects package.

Goal	Function
Predictions	`predictions()`
	`avg_predictions()`
	`plot_predictions()`
Comparisons	`comparisons()`
	`avg_comparisons()`
	`plot_comparisons()`
Slopes	`slopes()`
	`avg_slopes()`
	`plot_slopes()`
Grids	`datagrid()`
Hypotheses and Equivalence	`hypotheses()`
Bayes, Bootstrap, Simulation	`get_draws()`
	`inferences()`

The marginaleffects package includes many more powerful utilities. For example, the hypotheses() function allows analysts to conduct hypothesis or equivalence tests on parameter estimates, or on any of the other quantity produced by the package. This makes it easy to make cross-group comparisons, compare different effect sizes, and more. The datagrid() function is a convenient function to to create grids of predictor values; inferences() implements alternative inferential strategies like the bootstrap; and get_draws() makes it easy to extract draws from posterior distributions in Bayesian analyses.

The functions in marginaleffects greatly simplify the analysis of randomized experiments, and can play a key role in analyzing observational data. They are available in two programming languages, and compatible with over 100 classes of models—more than any comparable package. Supported models include linear, generalized linear (GLM), generalized additive (GAM), mixed-effects, fixed-effects, Bayesian models, and more.

Writing this book was only possible thanks to the work of many developers, including R Core Team (2022), (arrow?), Bürkner (2017), Dowle and Srinivasan (2022), (estimatr?), (fixest?), (formula?), Kay (2023), Wickham (2016), Brooks et al. (2017), (here?), (knitr?), (MASS?), Arel-Bundock (2022), (mvtnorm?), (nanoparquet?), (nnet?), Pedersen (2024), (rcpp?), (rcppeigen?), Henry and Wickham (2023), Lang (2017), Allaire et al. (2022), Zeileis, Köll, and Graham (2020), (tidymodels?), (tinytable?), Seabold and Perktold (2010).

1.3 Documentation

The marginaleffects package is accompanied by extensive documentation, available both online and in manual pages. In an R session, users can access the manual pages for any function using the standard help syntax.

?predictions
?comparisons
?slopes

In Python, documentation is available through the built-in help system.

help(predictions)
help(comparisons
help(slopes)

Comprehensive documentation is also hosted on the package website, which includes detailed function references, tutorials, and examples for both R and Python users. This website is regularly updated with new features and use cases.

https://marginaleffects.com

1.4 Data

All datasets used in this book can be accessed with the get_dataset() function from the marginaleffects package. This function can download data from two sources: the marginaleffects archive contains datasets specifically curated for this book, and the Rdatasets archive is a collection of over 2500 datasets which are commonly used in R packages and statistical education.

For example, the code that follows downloads data about the Titanic and display the first few cells of information.

library(marginaleffects)
dat = get_dataset("Titanic", "Stat2Data")
dat[1:5, c("Name", "Survived", "Age")]

                                           Name Survived   Age
1                  Allen, Miss Elisabeth Walton        1 29.00
2                   Allison, Miss Helen Loraine        0  2.00
3           Allison, Mr Hudson Joshua Creighton        0 30.00
4 Allison, Mrs Hudson JC (Bessie Waldo Daniels)        0 25.00
5                 Allison, Master Hudson Trevor        1  0.92

We can also search through available datasets using plain strings or regular expressions. To find all datasets related to the Titanic, we use the search argument.

get_dataset(search = "Titanic")

Each dataset comes with detailed documentation that you can view in your browser.

get_dataset("Titanic", "Stat2Data", docs = TRUE)

The datasets used in the Model to Meaning book can also be downloaded in CSV and Parquet formats at this URL:

https://marginaleffects.com/data/model_to_meaning.zip

Alexander, Rohan. 2023. Telling Stories with Data: With Applications in r. Chapman; Hall/CRC.

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2022. rmarkdown: Dynamic Documents for R. https://cran.r-project.org/package=rmarkdown.

Arel-Bundock, Vincent. 2022. “modelsummary: Data and Model Summaries in R.” Journal of Statistical Software 103 (1): 1–23. https://doi.org/10.18637/jss.v103.i01.

Arel-Bundock, Vincent, Noah Greifer, and Andrew Heiss. 2024. “How to Interpret Statistical Models Using marginaleffects for R and Python.” Journal of Statistical Software 111 (9): 1–32. https://doi.org/10.18637/jss.v111.i09.

Brooks, Mollie E., Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler, and Benjamin M. Bolker. 2017. “glmmTMB Balances Speed and Flexibility Among Packages for Zero-Inflated Generalized Linear Mixed Modeling.” The R Journal 9 (2): 378–400. https://doi.org/10.32614/RJ-2017-066.

Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

Dowle, Matt, and Arun Srinivasan. 2022. data.table: Extension of . https://r-datatable.com.

Henry, Lionel, and Hadley Wickham. 2023. rlang: Functions for Base Types and Core R and tidyverse Features. https://rlang.r-lib.org.

Kay, Matthew. 2023. ggdist: Visualizations of Distributions and Uncertainty. https://doi.org/10.5281/zenodo.3879620.

Lang, Michel. 2017. “checkmate: Fast Argument Checks for Defensive R Programming.” The R Journal 9 (1): 437–45. https://doi.org/10.32614/RJ-2017-028.

McKinney, Wes. 2022. Python for Data Analysis. " O’Reilly Media, Inc.".

Pedersen, Thomas Lin. 2024. Patchwork: The Composer of Plots. https://patchwork.data-imaginist.com.

R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Seabold, Skipper, and Josef Perktold. 2010. “statsmodels: Econometric and Statistical Modeling with Python.” In Proceedings of the 9th Python in Science Conference, 57,61:10–25080. Austin, TX.

Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd ed. Sebastopol, CA: O’Reilly Media. https://www.amazon.ca/dp/1492097403.

Zeileis, Achim, Susanne Köll, and Nathaniel Graham. 2020. “Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R.” Journal of Statistical Software 95 (1): 1–36. https://doi.org/10.18637/jss.v095.i01.

https://marginaleffects.com ↩︎
Like all articles in the JSS, the text is published under a permissive Creative Commons 3.0 license. https://creativecommons.org/licenses/by/3.0/↩︎