library(marginaleffects)
library(tinyplot)
tinytheme("ipsum")
set.seed(48103)
# number of days
n <- 365
# intervention at day
intervention <- 200
# time index from 1 to 365
time <- c(1:n)
# treatment variable: 0 before the intervention and 1 after
treatment <- c(rep(0, intervention), rep(1, n - intervention))
# outcome equation
outcome <-
10 + # pre-intervention intercept
15 * time + # pre-intervention slope (trend)
20 * treatment + # post-intervention intercept (shift of 10)
5 * treatment * time + # steeper slope after the intervention
rnorm(n, mean = 0, sd = 10) # noise
dat <- data.frame(outcome, time, treatment)
22 Interrupted Times Series
Interrupted Time Series (ITS) analysis is a widely used research design that aims to infer the impact of a discrete intervention or event by examining changes in an outcome measure over time. It finds applications in various fields, including public health policies—such as evaluating the effect of vaccination campaigns—and educational settings—like assessing curriculum reform. The key motivation behind ITS is to leverage data already being collected, identify a clear “interruption” at a known point in time (e.g., when a policy was enacted), and then compare the behavior of the outcome measure before and after this interruption.
Although ITS can be extremely useful, it also poses substantial methodological and practical challenges. Unlike experimental or quasi-experimental approaches, ITS relies on very strong assumptions about the continuity or predictability of the outcome’s pre-intervention trajectory. Unobserved confounding events and gradual policy rollouts can violate these assumptions, complicating efforts to attribute changes solely to the interruption.
ITS shares some similarities with the popular difference-in-differences (DiD) approach, in which outcomes in a treated group are contrasted with those in an untreated comparison group. Meeting the assumptions required for credible inference with DiD is extremely challenging, as is attested by recent methodological developments in this field (Roth et al. 2023). ITS is arguably even more difficult to get right, since it typically relies only a single times series, and thus lacks a true control group.
This article shows how to use the marginaleffects
package to analyze a simple ITS setup. To illustrate, we consider a dataset simulated by this code:
The data look like this:
Often, analysts will model such data using a linear regression model with multiplicative interaction, and a specially constructured “time since intervention” variable:
\[ \text{Outcome} = \beta_0 + \beta_1 \text{Time} + \beta_2 \text{Treatment} + \beta_3 \text{Time since intervention} + \varepsilon \]
With marginaleffects
, we do not even need to construct this special variable.1 All we have to do is fit this mathematically equivalent—and simpler—model:
\[ \text{Outcome} = \alpha_0 + \alpha_1 \text{Time} + \alpha_2 \text{Treatment} + \alpha_3 \text{Time} \times \text{Treatment} + \varepsilon \]
Call:
lm(formula = outcome ~ time * treatment, data = dat)
Residuals:
Min 1Q Median 3Q Max
-26.1551 -6.9865 0.0908 6.6543 30.8257
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.44518 1.36298 6.93 1.95e-11 ***
time 15.00023 0.01176 1275.56 < 2e-16 ***
treatment 16.23570 4.70542 3.45 0.000626 ***
time:treatment 5.01581 0.01961 255.77 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.602 on 361 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 6.778e+06 on 3 and 361 DF, p-value: < 2.2e-16
Then, we can use the marginaleffects
package to answer a series of scientific questions. For example, what would be the predicted outcome at time 200 with or without intervention?
p0 <- predictions(mod, newdata = datagrid(time = 199, treatment = 0))
p1 <- predictions(mod, newdata = datagrid(time = 200, treatment = 1))
p0
time treatment Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
199 0 2994 1.34 2230 <0.001 Inf 2992 2997
Type: response
p1
time treatment Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
200 1 4029 1.5 2683 <0.001 Inf 4026 4032
Type: response
This suggests that the intervention would have increased the outcome by about 1034.397664 units at time 200. Is the difference between those predictions statistically significant?
comparisons(mod, variables = "treatment", newdata = datagrid(time = 200))
time Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
200 1019 2.02 504 <0.001 Inf 1015 1023
Term: treatment
Type: response
Comparison: 1 - 0
What is the average slope of the outcome equation with respect to time, before and after the intervention?
avg_slopes(mod, variables = "time", by = "treatment")
Term treatment Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
time 0 15 0.0118 1276 <0.001 Inf 15 15
time 1 20 0.0157 1273 <0.001 Inf 20 20
Type: response
Comparison: dY/dX
This shows that the outcome increases more rapidly after the intervention.
In fact, analysts should not construct this special variable. The problem is that the
time
andtime_since
variables are related, but when we hard-code them,R
does not know the mathematical relationship between them.↩︎