Introduction to the simITS package

Introduction

This vignette quickly outlines the primary method calls for conducting an analysis of an Interrupted Time Series using the simulation approach proposed in the companion paper.

We first cover a simple regression model, then show how to do smoothing, then seasonality. We also make a brief note about generating fake data for the purposes of conducting simulation studies.

Basic ITS analysis

We use the raw Mecklenberg data to illustrate the simITS package.

data(mecklenberg)
head( mecklenberg )
#> # A tibble: 6 × 10
#>   month  karr pbail pptrel  pror  pb4c avg_days_initial avg_t2d pstint7 pstint30
#>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>            <dbl>   <dbl>   <dbl>    <dbl>
#> 1   -29  2450 0.661 0.0347 0.304 0.634            11.4     193.   0.204   0.0820
#> 2   -28  2518 0.668 0.0365 0.295 0.636             9.96    188.   0.156   0.0564
#> 3   -27  2501 0.641 0.0424 0.317 0.628             9.68    191.   0.158   0.0636
#> 4   -26  2472 0.646 0.0376 0.316 0.649            13.9     183.   0.174   0.0781
#> 5   -25  2628 0.664 0.0438 0.293 0.655            12.8     184.   0.211   0.0799
#> 6   -24  2629 0.647 0.0312 0.322 0.697            10.1     193.   0.169   0.0677
meck = mutate( mecklenberg, pbail = 100 * pbail )
ggplot( meck, aes( x=month, y=pbail)) +
  geom_rect(aes( ymin=-Inf, ymax=Inf, xmin=0.5, xmax=25, fill="lightgray"), col = "lightgray", alpha=0.25) +
  scale_fill_identity(name = "", guide = "none", labels = c('Post Policy era')) +
  geom_hline( yintercept = 0, col="black") +
  geom_line( col="black", lty=1, lwd=0.5) +
  geom_point() +
  scale_x_continuous( breaks = c(-29,-24,-18,-12,-6,1,6,12,18,24)) +    
  coord_cartesian(xlim=c(-29.5,24.5), ylim=c(0,100), expand=FALSE) +
  labs( title = " ", y = "Percent cases assigned bail", x = " " )

To have autoregressive errors we use lagged outcomes. We can add lagged outcomes (and covariates) as so:

meck = add_lagged_covariates( meck, outcomename = "pbail", covariates=NULL )
sample_n( meck, 5 ) %>% arrange( month )
#> # A tibble: 5 × 11
#>   month  karr pbail pptrel  pror  pb4c avg_days_initial avg_t2d pstint7 pstint30
#>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>            <dbl>   <dbl>   <dbl>    <dbl>
#> 1   -21  2239  63.8 0.0393 0.323 0.725            13.9     179.   0.196   0.0840
#> 2   -12  2018  60.5 0.0253 0.370 0.651            13.0     194.   0.206   0.0788
#> 3    -2  2062  60.6 0.0267 0.368 0.708            10.4     192.   0.164   0.0616
#> 4     1  2281  60.7 0.0237 0.369 0.718            11.9     196.   0.195   0.0688
#> 5     7  2061  50.1 0.0359 0.463 0.585             7.83    208.   0.154   0.0456
#> # ℹ 1 more variable: lag.outcome <dbl>

This package passes functions for fitting the model, and then uses these functions for doing the extrapolation. For the default, we use the package’s fit_model_default() which is a simple line (with lagged outcome as a covariate):

meck.pre = filter( meck, month <= 0 )    
mod = fit_model_default( meck.pre, "pbail", "month" )
summary( mod )
#> 
#> Call:
#> stats::lm(formula = stats::as.formula(form), data = dat[-c(1), 
#>     ])
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.7374 -1.6059 -0.2528  1.3190  4.6280 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 44.99557   11.73346   3.835 0.000718 ***
#> month       -0.12320    0.05526  -2.229 0.034642 *  
#> lag.outcome  0.26357    0.19033   1.385 0.177875    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.094 on 26 degrees of freedom
#> Multiple R-squared:  0.3575, Adjusted R-squared:  0.3081 
#> F-statistic: 7.234 on 2 and 26 DF,  p-value: 0.003178

To run the entire simulation and extrapolation as a call, we can directly do:

t0 = 0
envelope = process_outcome_model( meck, "pbail", "month", 
                                  t0=t0, R = 100, 
                                  summarize = TRUE, smooth=FALSE )
sample_n( envelope, 5 ) %>% arrange( month )
#> # A tibble: 5 × 10
#>   month  Ymin  Ymax range    SE Ystar     Y Ysmooth Ysmooth1  Ybar
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>   <lgl>    <dbl>
#> 1   -29  66.1  66.1   0    0     66.1  66.1 NA      NA        66.0
#> 2   -14  59.9  59.9   0    0     59.9  59.9 NA      NA        63.5
#> 3    -5  63.4  63.4   0    0     63.4  63.4 NA      NA        62.0
#> 4     8  53.2  64.7  11.5  2.91  60.1  49.2 NA      NA        59.9
#> 5    13  53.3  64.2  10.9  2.81  59.2  51.5 NA      NA        59.1

And plotting our results:

ggplot( envelope, aes( month ) ) +
  geom_line( aes( y=Y ), alpha = 0.6 ) +  # original data
  geom_point( aes( y=Y ) ) +              # original data
  geom_ribbon( aes( ymin=Ymin, ymax=Ymax ), alpha=0.2 ) +
  geom_line( aes( y = Ystar ), col="darkgrey" ) +
  geom_vline( xintercept = t0+0.5)

We provide a nice utility function to generate these graphs:

make_envelope_graph(envelope, t0=t0)

Testing and Impact Intervals

We can aggregate impacts for several time points as follows. First call process_outcome_model() without summarizing:

predictions = process_outcome_model( meck, "pbail", "month",
                                     t0=t0, R = 100,
                                     summarize = FALSE, smooth=FALSE )

Then use aggregate_simulation_results():

sstat = aggregate_simulation_results( orig.data = meck, 
                                      outcomename = "pbail", timename = "month",
                                      predictions = predictions, months = 1:18 )

quantile( sstat$t, c( 0.025, 0.975 ))
#>    2.5%   97.5% 
#> 54.7327 63.7711
sstat$t.obs
#> [1] 51.57867
sstat$t.obs - quantile( sstat$t, c( 0.025, 0.975 ))
#>      2.5%     97.5% 
#>  -3.15403 -12.19242

Generating fake data

For simulation we also offer a fake data generator. It works like this:

dat = generate_fake_data( t_min=-60, t_max=18, t0 = 0 )
ggplot( dat, aes( month, Y ) ) +
  geom_point() + geom_line()

Smoothing

Here we demonstrate summarizing and smoothing, using the fake data we just generated.

envelope = process_outcome_model( dat, "Y", "month", t0=t0, R = 100, 
                                  summarize = TRUE, smooth=TRUE )
make_envelope_graph(envelope, t0 )

We can smooth to different degrees using the smooth_k parameter:

alphas = c( 6, 11, 20, 100 )
preds = purrr::map( alphas, function( alpha ) {
  pds = process_outcome_model( dat, "Y", "month",
                               t0=t0, R = 20,
                               summarize = FALSE, smooth=TRUE,
                               smooth_k = alpha )
  pds
} )
names( preds ) = alphas
preds = bind_rows( preds, .id="alpha_k" )
ggplot( filter( preds, month >= t0 ), aes( month, Ysmooth ) ) +
  facet_wrap( ~ alpha_k ) +
  geom_line( aes( group=Run, col=alpha_k ), alpha=0.5, na.rm=TRUE) +
  geom_line( data=dat, aes( month, Y ), col="black", alpha=0.5 ) +
  geom_vline( xintercept=t0, col="red" ) +
  labs( x="month", y="proportion given bail")

Seasonality and covariates

A seasonality model on some fake data with a strong seasonality component is easy to fit. You just construct some code to fit the seasonality model via the make_fit_season_model() factory (you need to have the covariates pre-constructed in your data):

data( newjersey )
fit_season_model_qtemp =  make_fit_season_model( ~ temperature + Q2 + Q3 + Q4 )

envelope = process_outcome_model( newjersey, "n.warrant", "month", t0=-7, R = 100, 
                                  summarize = TRUE, smooth=TRUE, 
                                  fit_model = fit_season_model_qtemp )
make_envelope_graph( envelope, t0=-7 )

Note how it will construct the lagged covariates automatically. The make_fit_season_model() method records what covariates are needed from the passed formula.

Smoothing and seasonality

We can smooth around a seasonality model either with a default smoother made from the specified seasonality model (as was done above) or, like the following, with a specified one of your choice:

smoother = make_model_smoother( fit_model = fit_season_model_sin, covariates = newjersey )
envelope_sin = process_outcome_model( newjersey, "n.warrant", "month", t0=-7, R = 100,
                                  summarize = TRUE, smooth=TRUE, smoother = smoother, smooth_k = 11,
                                  fit_model = fit_season_model_qtemp )
envelope_sin$Ysmooth.base = envelope$Ysmooth
envelope_sin$Ysmooth1.base = envelope$Ysmooth1
make_envelope_graph( filter( envelope_sin, month > -30 ), t0=-7 ) +
  geom_line( aes( y=Ysmooth.base ), col="blue", na.rm=TRUE ) +
  geom_line( aes( y=Ysmooth1.base ), col="blue", lty=2, na.rm=TRUE )