Package 'simITS' reference manual

Title:	Analysis via Simulation of Interrupted Time Series (ITS) Data
Description:	Uses simulation to create prediction intervals for post-policy outcomes in interrupted time series (ITS) designs, following Miratrix (2020) <arXiv:2002.05746>. This package provides methods for fitting ITS models with lagged outcomes and variables to account for temporal dependencies. It then conducts inference via simulation, simulating a set of plausible counterfactual post-policy series to compare to the observed post-policy series. This package also provides methods to visualize such data, and also to incorporate seasonality models and smoothing and aggregation/summarization. This work partially funded by Arnold Ventures in collaboration with MDRC.
Authors:	Luke Miratrix [aut, cre], Brit Henderson [ctb], Chloe Anderson [ctb], Arnold Ventures [fnd], MDRC [fnd]
Maintainer:	Luke Miratrix <[email protected]>
License:	GPL-3
Version:	0.1.1
Built:	2025-02-23 03:31:24 UTC
Source:	https://github.com/lmiratrix/simits

Augment dataframe with lagged covariates

Description

Take outcome and a list of covariates and add new columns with lagged versions. Assumes rows of dataframe are in time ascending order. Lagged outcome canonically called 'lag.outcome'. Covariates 'lag.XXX'.

Usage

add_lagged_covariates(dat, outcomename, covariates = NULL)
add_lagged_covariates(dat, outcomename, covariates = NULL)

Arguments

`dat`	The dataframe
`outcomename`	The outcome of interest (string)
`covariates`	The covariates to lag along with the outcome. This can be either of two things. First, it can be a list of string names. Covariates can also be a function with a "lags" attribute with the listed covariates (as returned by, e.g., make_fit_season_model) (which is a list of string names). NULL if no covariates other than outcome should be lagged.

Value

Augmented dataframe with lagged covariates as new columns. Will clobber old columns if the names (of form "lag.XXXX") conflict.

Examples

data( "newjersey" )
newjersey = add_lagged_covariates(newjersey, "n.warrant", c("sin.m","cos.m" ) )
head( newjersey[ c( "n.warrant", "sin.m", "lag.outcome", "lag.sin.m" ) ] )
data( "newjersey" )
newjersey = add_lagged_covariates(newjersey, "n.warrant", c("sin.m","cos.m" ) )
head( newjersey[ c( "n.warrant", "sin.m", "lag.outcome", "lag.sin.m" ) ] )

Adjust an outcome time series based on the group weights.

Description

Reweight the components of a series to match target weights for several categories. This is a good preprocessing step to adjust for time-varying covariates such as changing mix of case types.

Usage

adjust_data(
  dat,
  outcomename = "Y",
  timename = "time",
  groupname = "G",
  Nname = "N",
  pi_star,
  is_count = FALSE,
  include_aggregate = FALSE,
  covariates = NULL
)
adjust_data(
  dat,
  outcomename = "Y",
  timename = "time",
  groupname = "G",
  Nname = "N",
  pi_star,
  is_count = FALSE,
  include_aggregate = FALSE,
  covariates = NULL
)

Arguments

`dat`	Dataframe of data. Requires an N column of total cases represented in each row.
`outcomename`	Name of column that has the outcome to calculated adjusted values for.
`groupname`	Name of categorical covariate that determines the groups.
`Nname`	Name of column in dat that contains total cases (this is the name of the variable used to generate the weights in pi_star).
`pi_star`	The target weights. Each time point will have its groups re-weighted to match these target weights.
`is_count`	Indicator of whether outcome is count data or a continuous measure (this impacts how aggregation is done).
`include_aggregate`	Include aggregated (unadjusted) totals in the output as well.
`covariates`	Covariates to be passed to aggregation (list of string variable names).

Value

Dataframe of adjusted data.

Examples

data( "meck_subgroup" )
head( meck_subgroup )
pis = calculate_group_weights( "category", Nname="n.cases", 
                               meck_subgroup, t_min=0, t_max= max( meck_subgroup$month ) )
pis

agg = aggregate_data( meck_subgroup, 
                      outcomename="pbail", groupname="category", Nname="n.cases", 
                      is_count=FALSE,
                      rich = TRUE, covariates = NULL )
head( agg )  


adjdat = adjust_data( meck_subgroup, "pbail", "category", "n.cases", pis, include_aggregate=TRUE )
head( adjdat )
data( "meck_subgroup" )
head( meck_subgroup )
pis = calculate_group_weights( "category", Nname="n.cases", 
                               meck_subgroup, t_min=0, t_max= max( meck_subgroup$month ) )
pis

agg = aggregate_data( meck_subgroup, 
                      outcomename="pbail", groupname="category", Nname="n.cases", 
                      is_count=FALSE,
                      rich = TRUE, covariates = NULL )
head( agg )  


adjdat = adjust_data( meck_subgroup, "pbail", "category", "n.cases", pis, include_aggregate=TRUE )
head( adjdat )

Aggregate grouped data

Description

This will take a dataframe with each row being the outcomes, etc., for a given group for a given time point and aggregate those groups for each time point.

Usage

aggregate_data(
  dat,
  outcomename = "Y",
  timename = "time",
  groupname = "G",
  Nname = "N",
  is_count = FALSE,
  rich = TRUE,
  covariates = NULL
)
aggregate_data(
  dat,
  outcomename = "Y",
  timename = "time",
  groupname = "G",
  Nname = "N",
  is_count = FALSE,
  rich = TRUE,
  covariates = NULL
)

Arguments

`dat`	Dataframe with one row for each time point and group that we are going to post stratify on. This dataframe should also have an column with passed name "Nname" indicating the number of cases that make up each given row. It should have a column "timename" for the time.
`outcomename`	String name of the outcome variable in dat.
`groupname`	Name of the column that has the grouping categorical variable
`Nname`	Name of variable holding the counts (weight) in each group.
`is_count`	If TRUE the data are counts, and should be aggregated by sum rather than by mean.
`rich`	If TRUE, add a bunch of extra columns with proportions of the time point that are each group and so forth.
`covariates`	group-invariant covariates to preserve in the augmented rich dataframe. These are not used in this method for any calculations. Pass as list of column names of dat

Value

Dataframe of aggregated data, one row per time point If rich=TRUE many extra columns with further information.

Examples

data( "meck_subgroup" )
head( meck_subgroup )
pis = calculate_group_weights( "category", Nname="n.cases", 
                               meck_subgroup, t_min=0, t_max= max( meck_subgroup$month ) )
pis

agg = aggregate_data( meck_subgroup, 
                      outcomename="pbail", groupname="category", Nname="n.cases", 
                      is_count=FALSE,
                      rich = TRUE, covariates = NULL )
head( agg )  


adjdat = adjust_data( meck_subgroup, "pbail", "category", "n.cases", pis, include_aggregate=TRUE )
head( adjdat )
data( "meck_subgroup" )
head( meck_subgroup )
pis = calculate_group_weights( "category", Nname="n.cases", 
                               meck_subgroup, t_min=0, t_max= max( meck_subgroup$month ) )
pis

agg = aggregate_data( meck_subgroup, 
                      outcomename="pbail", groupname="category", Nname="n.cases", 
                      is_count=FALSE,
                      rich = TRUE, covariates = NULL )
head( agg )  


adjdat = adjust_data( meck_subgroup, "pbail", "category", "n.cases", pis, include_aggregate=TRUE )
head( adjdat )

Test a passed test statistic on the simulated data

Description

This method is used to look at summary statistics such as average impact post-policy, and see how the predictive distribution compares to the observed.

Usage

aggregate_simulation_results(
  orig.data,
  predictions,
  outcomename = "Y",
  timename = "time",
  summarizer = calculate_average_outcome,
  ...
)
aggregate_simulation_results(
  orig.data,
  predictions,
  outcomename = "Y",
  timename = "time",
  summarizer = calculate_average_outcome,
  ...
)

Arguments

`orig.data`	The raw data (dataframe)
`predictions`	The results from process_outcome_model.
`outcomename`	Outcome to use.
`summarizer`	A function to calculate some summary quantity, Default: calculate_average_outcome
`...`	Extra arguments passed to the summarizer function.

Value

List of length two, with first item being the observed value of the test statistic and the second being a numeric vector representing the emperical reference distribution.

Examples

predictions = process_outcome_model( "pbail", mecklenberg,
                                    t0=0, R = 5,
                                    summarize = FALSE, smooth=FALSE )
sstat = aggregate_simulation_results( orig.data = mecklenberg, outcomename = "pbail",
                                     predictions = predictions, time_points = 1:18 )
sstat$t
sstat$t.obs

predictions = process_outcome_model( "pbail", mecklenberg,
                                    t0=0, R = 5,
                                    summarize = FALSE, smooth=FALSE )
sstat = aggregate_simulation_results( orig.data = mecklenberg, outcomename = "pbail",
                                     predictions = predictions, time_points = 1:18 )
sstat$t
sstat$t.obs

Summary function for summarize.simulation.results

Description

Given a set of simulation runs, estimate average impact over range of time points.

Usage

calculate_average_outcome(res, outcomename, timename, time_points = 1:54, ...)
calculate_average_outcome(res, outcomename, timename, time_points = 1:54, ...)

Arguments

`res`	Dataframe of a single series (simulated or otherwise)
`outcomename`	Name of outcome in res
`time_points`	Which time points to average over, Default: 1:18
`...`	Other parameters (ignored)

Value

Single number (in this case mean of given time points)

Examples

data( mecklenberg )
calculate_average_outcome( mecklenberg, "pbail", time_points=1:24 )
calculate_average_outcome( mecklenberg, "pbail", time_points = 1:18 )
data( mecklenberg )
calculate_average_outcome( mecklenberg, "pbail", time_points=1:24 )
calculate_average_outcome( mecklenberg, "pbail", time_points = 1:18 )

Calculate proportion of subgroups across time

Description

Calculate overall proportion of cases in each group that lie within a given interval of time defined by t_min and t_max.

Usage

calculate_group_weights(groupname, dat, t_min, t_max = Inf, Nname = "N")
calculate_group_weights(groupname, dat, t_min, t_max = Inf, Nname = "N")

Arguments

`groupname`	Name of the column that has the grouping categorical variable
`dat`	Dataframe with one row for each time point and group that we are going to post stratify on. This dataframe should also have an column with passed name "Nname" indicating the number of cases that make up each given row. It should have a column "timename" for the time.
`t_min`	The start time point to aggregate cases over.
`t_max`	The final time point (default is last time point).
`Nname`	Name of variable holding the counts (weight) in each group.

Value

Dataframe of each group along with overall average group weight in the specified timespan.

Examples

data( "meck_subgroup" )
head( meck_subgroup )
pis = calculate_group_weights( "category", Nname="n.cases", 
                               meck_subgroup, t_min=0, t_max= max( meck_subgroup$month ) )
pis

agg = aggregate_data( meck_subgroup, 
                      outcomename="pbail", groupname="category", Nname="n.cases", 
                      is_count=FALSE,
                      rich = TRUE, covariates = NULL )
head( agg )  


adjdat = adjust_data( meck_subgroup, "pbail", "category", "n.cases", pis, include_aggregate=TRUE )
head( adjdat )
data( "meck_subgroup" )
head( meck_subgroup )
pis = calculate_group_weights( "category", Nname="n.cases", 
                               meck_subgroup, t_min=0, t_max= max( meck_subgroup$month ) )
pis

agg = aggregate_data( meck_subgroup, 
                      outcomename="pbail", groupname="category", Nname="n.cases", 
                      is_count=FALSE,
                      rich = TRUE, covariates = NULL )
head( agg )  


adjdat = adjust_data( meck_subgroup, "pbail", "category", "n.cases", pis, include_aggregate=TRUE )
head( adjdat )

Extrapolate pre-policy data to post-policy era

Description

This function takes a fitted model and uses it to make post-policy predictions by simulating data.

Usage

extrapolate_model(
  M0,
  dat,
  outcomename = "Y",
  timename = "time",
  t0 = 0,
  R = 400,
  summarize = FALSE,
  smooth = FALSE,
  smoother = smooth_series,
  full_output = FALSE,
  fix_parameters = FALSE,
  ...
)
extrapolate_model(
  M0,
  dat,
  outcomename = "Y",
  timename = "time",
  t0 = 0,
  R = 400,
  summarize = FALSE,
  smooth = FALSE,
  smoother = smooth_series,
  full_output = FALSE,
  fix_parameters = FALSE,
  ...
)

Arguments

`M0`	The fit model
`dat`	Dataframe with data being analyzed.
`outcomename`	Outcome of interest (name of column).
`timename`	Name of the time variable (name of column).
`t0`	Last pre-policy timepoint
`R`	Number of replications
`summarize`	Boolean, TRUE means collapse all simulated trajectories into single aggregate. FALSE means return all paths.
`smooth`	Boolean. TRUE means fit a smoother to the trajectories and look at distribution of smoothed trajectories. FALSE means look at raw data treajectories.
`smoother`	Function to do smoothing, if smoothing set to TRUE. Default is smooth_series()
`full_output`	TRUE means smoother returns residuals as well as smoothed series.
`fix_parameters`	Keep the parameters in the model M0 as fixed; do not add parameter uncertainty.
`...`	Extra arguments to be passed to smoother (e.g, bandwidth).

Value

Dataframe with columns corresponding to the simulations. If summarize=TRUE, one row per time point in original data. If FALSE, all the details of all the runs are returned.

Examples

data("mecklenberg" )
mecklenberg = add_lagged_covariates( mecklenberg, "pbail"  )
mecklenberg.pre = dplyr::filter( mecklenberg, month <= 0 )
M0 = fit_model_default( mecklenberg.pre, "pbail" )
res = extrapolate_model( M0, "pbail", mecklenberg, 0, 1,
                         smooth=TRUE)
tail( res )
data("mecklenberg" )
mecklenberg = add_lagged_covariates( mecklenberg, "pbail"  )
mecklenberg.pre = dplyr::filter( mecklenberg, month <= 0 )
M0 = fit_model_default( mecklenberg.pre, "pbail" )
res = extrapolate_model( M0, "pbail", mecklenberg, 0, 1,
                         smooth=TRUE)
tail( res )

Default ITS model

Description

This fits the model 'outcomename ~ lag.outcome + time', with no covariates.

Usage

fit_model_default(dat, outcomename, timename = "time", lagless = FALSE, ...)
fit_model_default(dat, outcomename, timename = "time", lagless = FALSE, ...)

Arguments

`dat`	Dataframe of pre-policy data to fit model to. Needs a "time" column
`outcomename`	Outcome of interest
`lagless`	Boolean, include the lagged outcome, or not?
`...`	Extra arguments passed to the lm() call.

Value

A fit model (a 'lm' object from a 'lm()' call) from fitting a simple regression of outcome onto time and lagged time.

Examples

mecklenberg = add_lagged_covariates(mecklenberg, "pbail")
meck.pre = filter( mecklenberg, month <= 0 )
mod = fit_model_default( meck.pre, "pbail", "month", lagless = TRUE )
summary( mod )
mod = fit_model_default( meck.pre, "pbail", "month", lagless = FALSE )
summary( mod )
mecklenberg = add_lagged_covariates(mecklenberg, "pbail")
meck.pre = filter( mecklenberg, month <= 0 )
mod = fit_model_default( meck.pre, "pbail", "month", lagless = TRUE )
summary( mod )
mod = fit_model_default( meck.pre, "pbail", "month", lagless = FALSE )
summary( mod )

Make fake data for testing purposes.

Description

Defaults have heavy seasonality, and an extra bump in impact kicks in at 12 months post-policy.

Usage

generate_fake_data(
  t_min = -40,
  t_max = 9,
  t0 = 0,
  rho = 0.5,
  sd.omega = 1,
  coef_line = c(20, 0.05),
  coef_q = c(1, 0, -1, 0),
  coef_temp = 0.1,
  coef_sin = c(0, 0),
  coef_tx = c(0, 0.25, 5)
)
generate_fake_data(
  t_min = -40,
  t_max = 9,
  t0 = 0,
  rho = 0.5,
  sd.omega = 1,
  coef_line = c(20, 0.05),
  coef_q = c(1, 0, -1, 0),
  coef_temp = 0.1,
  coef_sin = c(0, 0),
  coef_tx = c(0, 0.25, 5)
)

Arguments

`t_min`	Index of first month
`t_max`	Index of last month
`t0`	Last pre-policy time point
`rho`	Autocorrelation
`sd.omega`	Standard deviation of the true residual
`coef_line`	Intercept and slope of the main trendline (list of 2).
`coef_q`	Coefficients for the four quarters (list of 4).
`coef_temp`	Coefficient for temperature.
`coef_sin`	Coefficents for sin and cos features (list of 2)
`coef_tx`	Coefficient for treatment post-policy (list of 3, initial offset, initial slope, additional slope past 12 months). Treatment is a piecewise linear function.

Value

A tibble having month , temperature , sin.m , cos.m , Q1, Q2 , Q3, Q4, post , Ystr0 , Ystr , Y

Examples

fdat = generate_fake_data(-100,100, rho = 0.95, coef_q=c(0,0,0,0), coef_temp = 0)
plot( fdat$month, fdat$Y, type="l" )
fdat2 = generate_fake_data(-100, 100, rho = 0.0, coef_q=c(0,0,0,0), coef_temp = 0)
plot( fdat$month, fdat2$Y, type="l" )
fdat = generate_fake_data(-100,100, rho = 0.95, coef_q=c(0,0,0,0), coef_temp = 0)
plot( fdat$month, fdat$Y, type="l" )
fdat2 = generate_fake_data(-100, 100, rho = 0.0, coef_q=c(0,0,0,0), coef_temp = 0)
plot( fdat$month, fdat2$Y, type="l" )

A fake DGP with time varying categorical covariate for illustrating the code.

Description

This code makes synthetic grouped data that can be used to illustrate benefits of post stratification.

Usage

generate_fake_grouped_data(
  t_min,
  t0,
  t_max,
  method = c("complex", "linear", "jersey")
)
generate_fake_grouped_data(
  t_min,
  t0,
  t_max,
  method = c("complex", "linear", "jersey")
)

Arguments

`t_min`	Index of first time point
`t0`	last pre-policy timepoint
`t_max`	Index of last time point
`method`	Type of post-stratification structure to generate (three designs of 'complex', 'linear' and 'jersey' were originally concieved of when designing simulation studies with different types of structure).

Value

Dataframe of fake data, with one row per group per time period.

Examples

fdat = generate_fake_grouped_data(t_min=-5,t_max=10, t0 = 0)
table( fdat$time )
table( fdat$type )
fdat = generate_fake_grouped_data(t_min=-5,t_max=10, t0 = 0)
table( fdat$time )
table( fdat$type )

Make envelope style graph with associated smoothed trendlines

Description

This method builds a ggplot object with the trendline and prediction envelope. It can be customized after the fact by adding more ggplot layers via normal ggplot "+" syntax.

Usage

make_envelope_graph(envelope, t0, ylab = "Y", xlab = "month")
make_envelope_graph(envelope, t0, ylab = "Y", xlab = "month")

Arguments

`envelope`	The result of a 'process_outcome_model()' call, i.e. dataframe with columns of original data, imputed data and, potentially, smoothed data.
`t0`	Last pre-policy timepoint. Will draw vertical line here.
`ylab`	Y label of plot
`xlab`	X label of plot

Value

Returns (does not yet display) a ggplot plot object containing the time series along with extrapolation and prediction envelope. This plot can be augmented and changed via standard ggplot commands.

Examples

data( "mecklenberg" )
t0 = 0
envelope = process_outcome_model( "pbail", mecklenberg, 
                                  t0=t0, R = 10, 
                                  summarize = TRUE, smooth=FALSE )
make_envelope_graph(envelope, t0=t0, ylab = "Proportion given bail") +
  ggplot2::labs( title="Sample ITS plot")
data( "mecklenberg" )
t0 = 0
envelope = process_outcome_model( "pbail", mecklenberg, 
                                  t0=t0, R = 10, 
                                  summarize = TRUE, smooth=FALSE )
make_envelope_graph(envelope, t0=t0, ylab = "Proportion given bail") +
  ggplot2::labs( title="Sample ITS plot")
data( "mecklenberg" )
t0 = 0
envelope = process_outcome_model( "pbail", mecklenberg, 
                                  t0=t0, R = 10, 
                                  summarize = TRUE, smooth=FALSE )
make_envelope_graph(envelope, t0=t0, ylab = "Proportion given bail") +
  ggplot2::labs( title="Sample ITS plot")
data( "mecklenberg" )
t0 = 0
envelope = process_outcome_model( "pbail", mecklenberg, 
                                  t0=t0, R = 10, 
                                  summarize = TRUE, smooth=FALSE )
make_envelope_graph(envelope, t0=t0, ylab = "Proportion given bail") +
  ggplot2::labs( title="Sample ITS plot")

Make a fit_model function that takes a seasonality component

Description

This method returns a function that will fit a model both with and without lagged outcomes.

Usage

make_fit_season_model(formula, no_lag = NULL)
make_fit_season_model(formula, no_lag = NULL)

Arguments

`formula`	Formula specifying seasonality. No outcome or time needed.
`no_lag`	Formula specifying additional covariates that should be included, but without lag (usually used due to colinearity of lagged outcomes, such as with a sin and cos component).

Details

You hand it a formula object specifying the seasonality, e.g., " ~ Q2 + Q3 + Q4", if you have quarterly season effects. This method assumes you want models with a linear time component as well, and will add that and an intercept in automatically.

It gives you a function back, that you can use to analyze data.

Value

Callable function. See make_fit_model.

Examples

data( "newjersey")
modF = make_fit_season_model( ~ temperature, timename = "month" )
newjersey = add_lagged_covariates( newjersey, "n.warrant", covariates = c("temperature") )
modF( newjersey, "n.warrant" )
data( "newjersey")
modF = make_fit_season_model( ~ temperature, timename = "month" )
newjersey = add_lagged_covariates( newjersey, "n.warrant", covariates = c("temperature") )
modF( newjersey, "n.warrant" )

Generate a collection of raw counterfactual trajectories

Description

Given a fit linear model 'fit0', generate R prediction series starting at t0. This takes model uncertainty into account by pulling from the pseudo-posterior of the model parameters (from Gelman and Hill arm package).

Usage

make_many_predictions(fit0, dat, R, outcomename, timename, t0)

make_many_predictions_plug(fit0, dat, R, outcomename, timename, t0)
make_many_predictions(fit0, dat, R, outcomename, timename, t0)

make_many_predictions_plug(fit0, dat, R, outcomename, timename, t0)

Arguments

`fit0`	The fit linear model to simulate from.
`dat`	A dataframe with the covariates needed by the model fit0 for both pre and post-policy time points
`R`	Number of series to generate.
`outcomename`	The name of the column in dat which is our outcome.
`t0`	Last time point of pre-policy. Will start predicting at t0+1.

Value

A data.frame with the collection of predicted series, one row per time point per replicate (so will have R*nrow(dat) rows).

Functions

make_many_predictions_plug(): This version makes multiple predictions using estimated parameters without additional uncertainty. This takes point estimates from the fit model as fixed parameters. WARNING: This method will not capture true uncertainty as it is not taking parameter uncertainty into account.

References

The 'arm' package, see https://cran.r-project.org/package=arm

Also see Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevelhierarchical models (Vol. 1). New York, NY, USA: Cambridge University Press.

Examples

data("mecklenberg" )
mecklenberg = add_lagged_covariates( mecklenberg, "pbail"  )
mecklenberg.pre = dplyr::filter( mecklenberg, month <= 0 )
M0 = fit_model_default( mecklenberg.pre, "pbail" )
res = make_many_predictions( M0, dat=mecklenberg, outcome="pbail", t0=0, R=2 )
tail( res )
data("mecklenberg" )
mecklenberg = add_lagged_covariates( mecklenberg, "pbail"  )
mecklenberg.pre = dplyr::filter( mecklenberg, month <= 0 )
M0 = fit_model_default( mecklenberg.pre, "pbail" )
res = make_many_predictions( M0, dat=mecklenberg, outcome="pbail", t0=0, R=2 )
tail( res )

Make a smoother that fits a model and then smooths residuals

Description

This helper function gives back a function that takes the resulting simulation data from a single iteration of the simulation, and fits 'fit_model' to it, smoothes the residuals, and puts the predictions from 'fit_model' back.

Usage

make_model_smoother(fit_model, covariates)
make_model_smoother(fit_model, covariates)

Arguments

`fit_model`	A function that takes data, fits a linear model, and returns the fit model. This function needs an option to include (or not) lagged covariates.
`covariates`	A dataframe with all covariates needed in the model fitting defined by fit_model.

Details

This can be used to build smoothers that smooth using models other than the model being used for extrapolation (e.g., a model without temperature).

Resulting functions have the following parameters: 'res' (the data), 't0' (start time), 'outcomename', 'post.only' flag (for smoothing only post data or not), and 'smooth_k', a tuning parameter for degree of smoothing.

Value

A smoother function that can be passed to the smoothing routines. This function is of the form listed above.

Examples

data( "newjersey")
modA = make_fit_season_model( ~ temperature )
modB = make_fit_season_model( ~ sin.m + cos.m )
newjersey = add_lagged_covariates( newjersey, "n.warrant", 
                                   covariates = c("sin.m", "cos.m", "temperature") )
smoother = make_model_smoother( fit_model = modA, covariates = newjersey )
class(smoother)

# Pass made function to process_outcome_model()
envelope = process_outcome_model( "n.warrant", newjersey, t0=-8, R = 1,
                                  summarize = TRUE, smooth=TRUE, 
                                  smoother = smoother, smooth_k = 11,
                                  fit.model = modB )

data( "newjersey")
modA = make_fit_season_model( ~ temperature )
modB = make_fit_season_model( ~ sin.m + cos.m )
newjersey = add_lagged_covariates( newjersey, "n.warrant", 
                                   covariates = c("sin.m", "cos.m", "temperature") )
smoother = make_model_smoother( fit_model = modA, covariates = newjersey )
class(smoother)

# Pass made function to process_outcome_model()
envelope = process_outcome_model( "n.warrant", newjersey, t0=-8, R = 1,
                                  summarize = TRUE, smooth=TRUE, 
                                  smoother = smoother, smooth_k = 11,
                                  fit.model = modB )

Mecklenberg data by subgroup of charge type

Description

Mecklenberg data that gives proportion of different charge categories of cases given bail (by month).

Usage

meck_subgroup
meck_subgroup

Format

A data frame with 144 rows and 5 variables:

month: integer Month, with 0 being month of policy implementation.
n.cases: integer Number of cases of that subgroup for that month
n.bail: interger Total number of cases given bail for that subgroup for that month
pbail: double Proportion of new cases in given subgroup in that month assigned bail
category: character Category of group (charge type).

Mecklenberg PSA Reform Data

Description

Monthly aggregate outcomes of various measures of interest from Mecklenberg. See MDRC Report.

Usage

mecklenberg
mecklenberg

Format

A data frame with 54 rows and 10 variables:

month: integer Month, with 0 being month of policy implementation.
karr: integer Total count of arrests.
pbail: double Proportion of cases in a given month assigned bail (or outright detention).
pptrel: double Proportion of cases assigned to pretrial supervised release.
pror: double Proportion of cases released on own recognizance.
pb4c: double Proportion of cases assigned to money bail (alternate coding from pbail, above).
avg_days_initial: double Average number of days spent detained before release due to bail, case resolution, etc.
avg_t2d: double Average time to case resolution (in days).
pstint7: double Proportion detained longer than 7 days.
pstint30: double Proportion detained longer than 30 days.

New Jersey PSA Reform aggregate data

Description

Montly aggregate counts of arrests of different types in New Jersey.

Usage

newjersey
newjersey

Format

A data frame with 106 rows and 11 variables:

month: integer Index of month.
sin.m: double cos of month number
cos.m: double sin of month number
M12: integer Month number
Q1: integer Indicator of 1st quarter.
Q2: integer Indicator of 2nd quarter.
Q3: integer Indicator of 3rd quarter.
Q4: integer Indicator of 4th quarter.
n.warrant: double Number of warrant arrests
n.summons: double Number of summons arrests
n: double Total number of arrests
temperature: double Average temperature in New Jersey that month.

Generate an ITS extrapolation simulation.

Description

This is the primary function to use to use this approach on a given dataset.

Usage

process_outcome_model(
  dat,
  outcomename = "Y",
  timename = "time",
  t0 = 0,
  R = 400,
  summarize = FALSE,
  smooth = FALSE,
  smoother = NULL,
  fit_model = fit_model_default,
  covariates = NULL,
  plug_in = FALSE,
  ...
)
process_outcome_model(
  dat,
  outcomename = "Y",
  timename = "time",
  t0 = 0,
  R = 400,
  summarize = FALSE,
  smooth = FALSE,
  smoother = NULL,
  fit_model = fit_model_default,
  covariates = NULL,
  plug_in = FALSE,
  ...
)

Arguments

`dat`	Dataframe with a column for time. The time column is assumed to be a sequence of integer values.
`outcomename`	Name of column in dat containing the observed outcomes.
`timename`	Name of column in dat containing the time points.
`t0`	Last pre-policy timepoint
`R`	Number of simulated pre-policy extrapolations to generate.
`summarize`	Summarise the series? (TRUE/FALSE)
`smooth`	Smooth the series? (TRUE/FALSE)
`smoother`	Function to smooth residuals, if smoothing set to TRUE. If NULL, will dynamically make a model smoother based on the fit_model method if covariates are passed. Otherwise it will use the simple smoother on the outcomes.
`fit_model`	The function used to fit the model to simulate from. (This model could be a seasonality model. Default is simple linear model with no covariates.)
`covariates`	Vector of covariate names of all covariates used in the passed model function fit_model. If null, will attempt to get list of covariates form the "lags" attribute of the passed 'fit_model'.
`plug_in`	Use the estimated parameters as fixed and do not include extra uncertainty of parameter estimation in the simulation. (Not recommended as it destroys inference.)
`...`	Extra arguments to be passed to 'extrapolate_model()'

Details

Take a given outcome variable, fit an ITS model, use it to extrapolate R plusible trajectories, and then using these trajectories, generate final impact results by averaging (if summarize is set to TRUE).

This function is basically a wrapper for 'extrapolate_model()' with some extra calls to 'make_model_smoother()' to prepare, in the case of smoothing, and adding on a summary trend via 'generate_Ybars()' in the case of summarizing.

Value

If summarize=TRUE, A dataframe with several columns of interest and one row per time point of data. The columns are Ymin and Ymax, the limits of the envelope, 'range', the range of the envelope, 'SE', the standard deviation of the trajectories at that time point, 'Ysmooth' the median smoothed value at that time point (if smoothing), 'Ystar' the median unsmoothed value at that time point (regardless of smooth flag), 'Y', the observed outcome, 'Ysmooth1', the smoothed observed outcomes, and 'Ybar' the predicted outcome given the model with no autoregressive aspect.

If summarize=FALSE, a dataframe of all the raw series generated.

Examples

data( "mecklenberg" )
t0 = 0
envelope = process_outcome_model( "pbail", mecklenberg, 
                                  t0=t0, R = 10, 
                                  summarize = TRUE, smooth=FALSE )
make_envelope_graph(envelope, t0=t0, ylab = "Proportion given bail") +
  ggplot2::labs( title="Sample ITS plot")
data( "mecklenberg" )
t0 = 0
envelope = process_outcome_model( "pbail", mecklenberg, 
                                  t0=t0, R = 10, 
                                  summarize = TRUE, smooth=FALSE )
make_envelope_graph(envelope, t0=t0, ylab = "Proportion given bail") +
  ggplot2::labs( title="Sample ITS plot")

`simITS` package overview

Description

Analysis via Simulation of Interrupted Time Series

Details

This package is based on the backbone analytic code for the analyses in, e.g., Redcross et al. (2019) or Golub et al. (2019). See companion paper Miratrix (2020) for technical discussion of the overall approach.

Broadly, this package provides methods for fitting Interrupted Time Series models with lagged outcomes and variables to account for temporal dependencies. It then conducts inference via simulation, simulating a set of plausible counterfactual post-policy series to compare to the observed post-policy series. This package provides methods to visualize such data, and also to incorporate seasonality models and smoothing and aggregation/summarization. See the vignette for a guide of how to conduct such analyses.

References

Redcross, C., Henderson, B., Valentine, E. & Miratrix, L. (2019). Evaluation of pretrial justice system reforms that use the public safety assessment: Effects in Mecklenburg County, North Carolina. Technical report, MDRC (link)

Golub, C. A., Redcross, C., Valentine, E., & Miratrix, L. (2019). Evaluation of pretrial justice system reforms that use the public safety assessment: Effects of New Jersey’s criminal justice reform. Technical report, MDRC. (link)

Miratrix, L. (2020). Using Simulation to Analyze Interrupted Time Series Designs (link)

Smooth residuals after model fit

Description

Smooth a series by fitting the model to the data, smoothing the residuals, and then putting the model predictions back.

Usage

smooth_residuals(
  res,
  t0 = 0,
  outcomename = "Y",
  timename = "time",
  post.only = TRUE,
  smooth_k = SMOOTH_K,
  fit_model = fit_model_default,
  covariates = res,
  full_output = FALSE
)
smooth_residuals(
  res,
  t0 = 0,
  outcomename = "Y",
  timename = "time",
  post.only = TRUE,
  smooth_k = SMOOTH_K,
  fit_model = fit_model_default,
  covariates = res,
  full_output = FALSE
)

Arguments

`res`	A dataframe with a 'timename' column and an 'outcomename' column (which is the column that will be smoothed).
`t0`	last pre-policy timepoint
`outcomename`	String name of the outcome variable in dat. Defaut is "Y".
`timename`	Name of the time column. Default is "time".
`post.only`	If TRUE fit model and smooth post-policy only. WHY fit model on post-policy data only? Because this will make sure the fixed pre-policy does not dominate too much? We are focusing on post-policy so we want a good fitting model for that so we can get our residuals as "white noise" as possible before smoothing.
`smooth_k`	A rough proxy for the number of observations to primarily consider to kernal weight in the neighborhood of each timepoint (this is a bandwidth, and the loess smoother gets smooth_k / n as a span value). We want to smooth with an absolute bandwidth, not one as function of how long the time series is.
`fit_model`	A function that takes data, fits a linear model, and returns the fit model. This function needs an option to include (or not) lagged covariates.
`covariates`	A dataframe with all covariates needed in the model fitting defined by fit_model.
`full_output`	If TRUE give back pieces for diagnostics of smoothing process.

Details

Use loess smoother on complete series of residuals including original data pre-policy and synthetic data post policy (i.e., smooth the entire plausible series).

Value

A numeric vector of the smoothed residuals. If full_output=TRUE return a dataframe with several other columns: 'resid', the residuals based on Ystar and the model, ‘residStar' the smoothed residuals, ’Ybar.sm' the structural predictions of the model used for smoothing. Here the smoothed values will be 'Ysmooth'.

Examples


data( "newjersey" )
smooth = smooth_series( newjersey, outcomename = "n.warrant", t0= -8,
                        smooth_k = 30,
                        post.only = FALSE)
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )  

mod =  make_fit_season_model( ~ temperature )
newjersey = add_lagged_covariates( newjersey, outcomename = "n.warrant", 
                                   covariates = c("temperature") )

smooth = smooth_residuals( newjersey, outcomename = "n.warrant", t0=-8,
                           smooth_k = 30,
                           post.only = FALSE,
                           fit_model = mod )
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )  
data( "newjersey" )
smooth = smooth_series( newjersey, outcomename = "n.warrant", t0= -8,
                        smooth_k = 30,
                        post.only = FALSE)
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )  

mod =  make_fit_season_model( ~ temperature )
newjersey = add_lagged_covariates( newjersey, outcomename = "n.warrant", 
                                   covariates = c("temperature") )

smooth = smooth_residuals( newjersey, outcomename = "n.warrant", t0=-8,
                           smooth_k = 30,
                           post.only = FALSE,
                           fit_model = mod )
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )

Smooth a series using a static loess smoother

Description

Use loess smoother on complete series of residuals including original data pre-policy and synthetic data post policy (i.e., smooth the entire plausible series).

Usage

smooth_series(
  res,
  outcomename = "Y",
  timename = "time",
  t0 = 0,
  smooth_k = SMOOTH_K,
  post.only = TRUE,
  ...
)
smooth_series(
  res,
  outcomename = "Y",
  timename = "time",
  t0 = 0,
  smooth_k = SMOOTH_K,
  post.only = TRUE,
  ...
)

Arguments

`res`	A dataframe with a 'timename' column and an 'outcomename' column (which is the column that will be smoothed).
`outcomename`	String name of the outcome variable in dat. Defaut is "Y".
`timename`	Name of the time column. Default is "time".
`t0`	last pre-policy timepoint
`smooth_k`	A rough proxy for the number of observations to primarily consider to kernal weight in the neighborhood of each timepoint (this is a bandwidth, and the loess smoother gets smooth_k / n as a span value). We want to smooth with an absolute bandwidth, not one as function of how long the time series is.
`post.only`	If TRUE fit model and smooth post-policy only. WHY fit model on post-policy data only? Because this will make sure the fixed pre-policy does not dominate too much? We are focusing on post-policy so we want a good fitting model for that so we can get our residuals as "white noise" as possible before smoothing.
`...`	Extra arguments (not used in this function).

Details

This method takes several parameters it does not use, to maintain compatability with smooth_residuals.

Value

An updated version of the 'res' dataframe with 'Ysmooth', the smoothed predictions of the original Ystar outcome. Also includes 'Ystar' the original sequence to be smoothed.

Examples


data( "newjersey" )
smooth = smooth_series( newjersey, outcomename = "n.warrant", t0= -8,
                        smooth_k = 30,
                        post.only = FALSE)
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )  

mod =  make_fit_season_model( ~ temperature )
newjersey = add_lagged_covariates( newjersey, outcomename = "n.warrant", 
                                   covariates = c("temperature") )

smooth = smooth_residuals( newjersey, outcomename = "n.warrant", t0=-8,
                           smooth_k = 30,
                           post.only = FALSE,
                           fit_model = mod )
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )  
data( "newjersey" )
smooth = smooth_series( newjersey, outcomename = "n.warrant", t0= -8,
                        smooth_k = 30,
                        post.only = FALSE)
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )  

mod =  make_fit_season_model( ~ temperature )
newjersey = add_lagged_covariates( newjersey, outcomename = "n.warrant", 
                                   covariates = c("temperature") )

smooth = smooth_residuals( newjersey, outcomename = "n.warrant", t0=-8,
                           smooth_k = 30,
                           post.only = FALSE,
                           fit_model = mod )
plot( newjersey$month, newjersey$n.warrant )
lines( newjersey$month, smooth, col="red" )

Package 'simITS'

Help Index

Augment dataframe with lagged covariates

Description

Usage

Arguments

Value

Examples

Adjust an outcome time series based on the group weights.

Description

Usage

Arguments

Value

Examples

Aggregate grouped data

Description

Usage

Arguments

Value

Examples

Test a passed test statistic on the simulated data

Description

Usage

Arguments

Value

Examples

Summary function for summarize.simulation.results

Description

Usage

Arguments

Value

See Also

Examples

Calculate proportion of subgroups across time

Description

Usage

Arguments

Value

Examples

Extrapolate pre-policy data to post-policy era

Description

Usage

Arguments

Value

See Also

Examples

Default ITS model

Description

Usage

Arguments

Value

Examples

Make fake data for testing purposes.

Description

Usage

Arguments

Value

Examples

A fake DGP with time varying categorical covariate for illustrating the code.

Description

Usage

Arguments

Value

Examples

Make envelope style graph with associated smoothed trendlines

Description

Usage

Arguments

Value

See Also

Examples

Make a fit_model function that takes a seasonality component

Description

Usage

Arguments

Details

Value

See Also

Examples

Generate a collection of raw counterfactual trajectories

`simITS` package overview