Experiments TIG Week: Keith Zvoch on Strong Program Evaluation Design Alternatives

Keith Zvoch here. I am an Associate Professor at the University of Oregon. In this post, I would like to discuss regression discontinuity (RD) and interrupted time series (ITS) designs, two strong and practical alternatives to the randomized control trial (RCT).

Cool Trick: Take Advantage of Naturally Occurring Design Contexts

Evaluators are often charged with investigating programs or policy in situations where need-based assignment to conditions is required. In these contexts, separation of program effects from the background, maturational, and motivational characteristics of program participants is challenging. However, if performance on a preprogram measure allocates program services (RD), or if repeated measurement of an outcome exists prior to and contiguous with the intervention (ITS), then evaluators can draw on associated design frameworks to strengthen inference regarding program impact.

Lessons Learned: Design Practicality and Strength vs. Analytic Rigor

RD designs derive strength from knowledge of the selection mechanism used to assign individuals to treatment, whereas ITS designs leverage the timing of a change in policy or practice to facilitate rigorous comparison of adjacent developmental trends. Although procedurally distinct, the designs are conceptually similar in that a specific point along a continuum serves as a basis for the counterfactual. In RD designs, effects are revealed when a discontinuity in a score linking assignment and outcome exists at a cutpoint used to allocate program services. In ITS designs, effects are identified when the level or slope of the intervention time series deviates from the pre-intervention time series.

RD and ITS designs are particularly appropriate for program evaluators as they are minimally intrusive and are consistent with the need based provisioning of limited resources often found in applied service contexts. Nonetheless, it should be noted that strength of inference in both designs depends on treatment compliance, the absence of a spurious relationship coincidental with the cutpoint, and statistical conclusion validity in modeling the functional form of pre-post relationships. In many cases, inferential strength can be further enhanced by incorporating other naturally occurring design elements (e.g., multiple cutpoints, multiple treatment replications) or by drawing on administrative datasets to construct additional comparison or control groups.

The need for more extensive data collection and sensitivity analyses may of course present a nontrivial challenge in some evaluation contexts, but when considered relative to the practical and ethical difficulties that often surround implementation of a field-based RCT, an increase in analytic rigor will often prove an acceptable trade-off.

Hot Tip: Follow Within Study Design research! WSD involves head-to-head comparisons of experimental and quasi-experimental designs, including the conditions under which selected quasi-experimental evaluation designs can replicate experimental results.

Rad Resource: This article explores how to use additional design elements to improve inference from RD designs.

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Leave a Comment Cancel Reply