AEA365 | A Tip-a-Day by and for Evaluators

CAT | Design and Analysis of Experiments

Welcome to the final installment of the Design & Analysis of Experiments TIG-sponsored week of AEA365.  It’s Laura Peck of Abt Associates, here again to address some complaints about experiments.

Experiments have limited external validity

Experimental evaluation designs are often thought to trade internal validity (ability to claim cause-and-effect between program and impact) with external validity (ability to generalize results).  Although plenty of experiments do limit generalizing to their sample, there is good news from the field. Recent scholarship reveals techniques—retrospective analyses and prospective planning—that can improve generalizability. You can read more these advances in recent articles, here, here, and here.

Experiments take too long

Experimental evaluations have a bad reputation for taking too long.  Certainly there are some evaluations that track long-term outcomes and, by definition, must take a long time. That may be a criticism of any evaluation charged with considering long-term effects.  A recent push within the government is challenging the view that experiments take too long: the White House Social and Behavioral Sciences Team is helping government identify “nudge” experiments that involve tweaking processes and influencing small behaviors to affect short-term outcomes.  It is my hope that these efforts will improve our collective ability to carry out faster experimental research and extend the method to other processes and outcomes of interest.

Another reason experiments may take a long time is that enrolling a study sample takes time.  This depends on specific program circumstances, and it does not necessarily need to be the case. For example, the first round of the Benefit Offset National Demonstration enrolled about 80,000 treatment individuals into its evaluation at one time, with the treatment group getting a notification letter of the new program rules.  Such a change can be associated with large sample build up in a very short time.

Experiments cost too much

A rule of thumb is that evaluation should comprise one-tenth of a program budget. So, for a program that costs $3 million per year, $300,000 should be invested in its evaluation.  If the evaluation shows that the program is ineffective, then society will have spent $300,000 to save $3 million per year in perpetuity.  Efforts are underway to ensure that low-cost experiments become feasible in many fields, such as using administrative data, including integrating data from systems across agencies.

The Bottom Line

Experimental evaluations need not be more time-consuming or costly than other kinds of impact evaluation; and the future is bright for experimental evaluations to meet high standards regarding external validity.

This week’s-worth of posts shows that the many critiques of experiments are not damning when carefully scrutinized, thanks to recent methodological advances in the evaluation field.

Rad Resource:

For additional detail on today’s criticisms of experiments and others that this week-long blog considers, please read On the Feasibility of Extending Social Experiments to Wider Applications.

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

·

Hello, again!  It’s Steve Bell here, that evaluator with Abt Associates who is eager to share some insights regarding the learning potential of social experiments. In a week-long blog series, we are examining concerns about social experiments to offer tips for how to avoid common pitfalls and to support the extension of this powerful research method to wider applications.

Today we turn to three apparent drawbacks in what experiments can teach us.  Perhaps you’ve heard these concerns:

  • “You can’t randomize an intervention that seeks to change a whole community and its social systems.”
  • “If you put some people into an experiment it will affect other people you’ve left out of the study.”
  • “The impacts of individual program components are lost in the overall ‘with/without’ comparison provided by a social experiment.”

Examination of these three perspectives implies that none of them should deter the use of randomized experiments.

First, evaluations of community-wide interventions are prime candidates for application of the experimental method if the policy questions to be addressed are sufficiently important to justify the resources required.  The U.S. is a very large nation, with tens of thousands of local communities or neighborhoods that could be randomly assigned into or out of a particular community-level policy or intervention.  There is no feasibility constraint to randomizing many places, only a willingness constraint.  And sure, community saturation interventions make data collection more difficult and expensive, and any impacts that do occur are harder to find because they tend to be diffused across many people in the community.  However, these drawbacks afflict any impact evaluation of a saturation intervention, not just randomized experiments.

Second, in an interconnected world, some consequences of social policies inevitably spill over to individuals not directly engaged in the program or services offered. This is a measurement challenge. All research studies, including experimental studies, that are based exclusively on data for individuals participating in an intervention and a sample of unaffected non-participants will miss some of the intervention’s effects.  Randomization does not make spillover effects more difficult to measure.

The up/down nature of experimental findings is thought to limit the usefulness of social experiments as a way to discover how a program can be made more effective or less costly through changes in its intervention components.  One response is obvious: randomize more things, including components.  Multi-stage random assignment also can be used to answer questions about the effects of different treatment components when program activities naturally occur in sequence

The bottom line:  Don’t let naysayers turn society away from experimental designs without first thinking through what is achievable.

Up for our final discussion tomorrow: The “biggest complaints” about experiments debunked.

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

Hello!  I am Lisette Nieves, founder of Year Up NY, a service organization that has happily and successfully used an experimental evaluation to assess program effectiveness.  Today’s blogpost reflects on administrative challenges that need not get in the way of using experiments in practice.  I strongly believe in the nonprofit sector and what it does to support individuals in overcoming obstacles and building competencies to be successful. I also know that people in this sector want to know the impact of their efforts. With this understanding in mind, choosing to use an experimental evaluation at Year Up NY was not difficult, and the journey offered three key lessons. 

Lesson #1: Evaluation involves change, and change poses challenges.

Although everyone on the team agreed to support evaluation, the frontline team members—those who worked closest with our young adults—found it difficult to deny access to those seeking program enrollment. Team members’ buy-in was especially challenging once the names of prospective participants were attached to an experimental pool, personalizing the imminent selection process into treatment and control groups.  As a committed practitioner and program founder, I found it important to surface questions, ask for deeper discussions around the purpose and power of our evaluation, and create the space for team members to express concerns. Buy-in is a process with individualized timetables; staff may need multiple opportunities to commit to the evaluation effort.

Lesson #2: Program leaders tend to under-communicate when change is happening.

Leading a site where an experimental evaluation was taking place forced me to use language that shepherded staff through a high-stakes change effort.  Team members worried if the results would surprise us (although prior monitoring implied we were on track). The evaluation became central to weekly meetings where staff engaged in a healthy discussion about our services and how we were doing. With information on attrition patterns, even the most cautious staff members began to fully buy-in to the experimental evaluation. In the end, evaluation was about making us stronger and demonstrating impact—two key values that we as a team were wedded to with or without an experimental evaluation.

Lesson #3: Experimental evaluation is high stakes, but it can be hugely informative.

An experimental evaluation has many challenges, and some of its requirements are challenging (but not insurmountable) to implement among the social service providers nationwide. But I have no regrets for engaging in an experimental evaluation: we learned more about our organization and systems than we would have otherwise.  Experimental evaluation made us a true learning organization, and for that I encourage other organizations to consider taking evaluation efforts further.

Up for discussion tomorrow:  more things you thought you couldn’t learn from an experiment but can!

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

Hello.  I am Steve Bell, Research Fellow at Abt Associates specializing in rigorous impact evaluations, here to share some thoughts about experimental evaluations in practice.  In this week-long blog series, we are examining concerns about social experiments to offer tips for how to avoid common pitfalls and to support the extension of this powerful research method to wider applications.

Today, we ask whether randomization necessarily distorts the intervention that an experiment sets out to evaluate. A potential treatment group distortion occurs when the experiment excludes a portion of a program’s normally-served population to form a research “control” group. As a result, either (1) the program serves fewer people than usual, operating below normal capacity, or (2) it serves people who ordinarily would not be served.  The first scenario can be problematic if the slack capacity allows programs to offer participants more services than usual, artificially enhancing the intervention when compared to its normal state. The second scenario can be problematic if the people who are now being served are different than those ordinarily served.  For example, if a program changes its eligibility criteria—for example, lowering background educational requirements—then a different group of people is served, and this might lead to larger or smaller program impacts than would be the case for the standard program targets.  Fortunately, Olsen, Bell and Nichols (2016) have proposed a way to identify which individuals would ordinarily have been served so that impact results can be produced for just that subset.

The problem of a different-than-usual participant population diminishes in degree as the control group shrinks in size relative to the studied program’s capacity.  With few control group members in any site, the broadening of the pool of people served by the program is less substantial.  This supports another solution: where feasible, an evaluation should spread a fixed number of control group members across many local programs, creating only a few individual control group cases in any one community.  This is a desirable option as well for program staff who are often hesitant to turn away many applicants to form a control group.

In sum, social experiments need not distort the programs they set out to study.

Up for discussion tomorrow: Practitioner insights on how to overcome some common administrative challenges to running an experiment.

Rad Resource:

For additional detail on this issue of the fidelity of policy comparisons, as well as other issues that this week-long blog considers, please read On the Feasibility of Extending Social Experiments to Wider Applications.

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

Hello AEA365 readers!  I am Laura Peck, founder and co-chair of the AEA’s recently-established (and growing) Design & Analysis of Experiments TIG.  I work at Abt Associates as an evaluator in the Social & Economic Policy Division and director of Abt’s Research & Evaluation Expertise Center.  Today’s AEA365 blogpost recaps what experimental evaluations typically tell us and highlights recent research that helps tell us more.

As noted yesterday, dividing eligible program participants randomly into groups—a “treatment group” that gets the intervention and a “control group” that does not—means the difference in the groups’ outcomes is the intervention’s “impact.”  This is the “average treatment effect” of the “intent to treat” (ITT).  The ITT is the effect of the offer of treatment, regardless whether those offered “take up” the offer.  There can also be interest in (a) the effect of taking up the offer; and (b) the impact of other, post-randomization milestone events within the overall treatment, two areas where pushing experimental evaluation data can tell us more.

The ITT effect is commonly considered to be the most policy relevant:  in a world where program sponsors don’t mandate participation but instead make services available, the ITT captures the average effect of making the offer.

Fortunately, a widely-accepted approach exists for converting the ITT into the effect of the treatment-on-the-treated (TOT).  The ITT can be rescaled by the participation rate—under the assumption that members of the treatment group who do not participate (“no-shows”) experience none of the program’s impact.  For example, if the ITT estimate shows an improvement of $1,000 in earnings, in a study where 80% of the treatment group took up the training ($1,000 divided by 0.80), then the TOT effect would be $1,250 for the average participant.

In addition, an active body of research advances methods for understanding mediators—those things that happen after the point of randomization that subsequently influence program impact. For example, although improving earnings may be a job training program’s ultimate goal, we might want to know whether earning a credential generates additional earnings gains.  Techniques that leverage the experimental design to produce strong estimates of the effect of some mediator include: capitalizing on cross-site and cross-participant variation, instrumental variables (including principal stratification), propensity score matching, and analysis of symmetrically-predicted endogenous subgroups (ASPES).  These use existing experimental data and increasingly are being planned into evaluations.

From this examination of the challenge of the day, we conclude that social experiments can provide useful information on the effects of participation and the effects of post-randomization events in addition to the standard (ITT) average treatment effect.

Up for discussion tomorrow:  are the counterfactual conditions that experiments create the right ones for policy comparisons?

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Greetings, and welcome to a week’s worth of insights sponsored by the Design and Analysis of Experiments TIG!  We are Laura Peck and Steve Bell, program evaluators with Abt Associates.

When deciding how to invest in social programs, policymakers and program managers increasingly ask for evidence of effectiveness.  A strong method for measuring a program’s impact is an experimental evaluation, dividing eligible program applicants into groups at random: a “treatment group” that gets the intervention and a “control group” that does not.  In such a design, when different outcomes emerge it can be interpreted as a consequence of the intervention.  In this week-long blog, we examine concerns about social experiments, starting with ethics.

A common concern in planning experimental evaluations is the ethics of randomizing access to government services. Are the individuals who “lose the government lottery” and enter the control group disadvantaged unfairly or unethically?  Randomizing who gets served is just one way to ration access to a funding-constrained program.  Giving all deserving applicants an equal chance through a lottery, is the fairest, most ethical way to proceed when not all can be served. Furthermore, the good news is that program staff are wonderfully creative in blending local procedures with randomization in order to ensure they are serving their target populations and preserve the experiment’s integrity. For example, an ongoing evaluation of a homeless youth program lets program staff use their existing needs-assessment tools to prioritize youth for program entry while overlaying the randomization process on those preferences:  it’s a win-win arrangement!

Even should control group members be disadvantaged in a particular instance, there is reason this might not be unethical (see Blustein). Society, which benefits from accurate information about program effectiveness, may be justified in allowing some citizens to be disadvantaged in order to gather information to achieve wider benefits for many. Society regularly disadvantages individuals based on government policy decisions undertaken for non-research reasons (for example, opening high-occupancy vehicle lanes that disadvantage solo commuters to the benefit of carpoolers).  Unlike control group exclusions, those decisions are permanent not temporary.

Moreover, in a world of scarce resources, it is unethical to continue to operate ineffective programs.  From this alternative perspective, it is unethical not to use rigorous impact evaluation to provide strong evidence to guide spending decisions.

Finally, social experiments are in widespread use, signaling that society has already judged them to be ethically acceptable. The ethics of experiments can be somewhat challenging in particular evaluation environments, but our experience suggests that ethics generally need not be an obstacle to their use.

Up for discussion tomorrow is what experiments can tell us about program effects, when researchers apply conventional and new analytic methods to experimental data.

Rad Resource:

For additional detail on the ethics question, as well as other issues that this week-long blog considers, please read On the Feasibility of Extending Social Experiments to Wider Applications.

 

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Hi again, it’s Laura Peck here, that evaluator from Abt Associates.  To close out the Design & Analysis of Experiments TIG’s first week of contributions to the AEA365 blog, I focus on one of the main critiques of experimental evaluations, that known as the “black box” criticism.

Experimentally-designed evaluations can isolate the impact of an intervention:  the difference in the treatment and control group outcomes—or the impact—cannot be attributed to other forces (see Tuesday’s blogpost), when an experiment is appropriately implemented. But what the program is is often referred to as a “black box”—a total unknown—in experiments.

Good evaluations (all those that I am involved in) couple implementation analysis with impact analysis as a way to respond to the criticism that impact analysis alone cannot tell us what happens inside an intervention.  That implementation analysis exposes what is inside the “black box” so that the impact analysis can use that information in interpreting its results:  for large impacts, what happened in the intervention that explains why they arose; for null impacts, what did not happen in the intervention to explain why they did not.

Although implementation evaluation remains valuable in and of itself, experimental impact evaluations increasingly build on design and analytic innovations to answer “black box” questions on their own.

On the design front, for example, evaluations are adding treatment arms to isolate the effect of program variants.  In the News Family Options Study (funded by the U.S. Department of Housing and Urban Development) randomized to four arms to test the relative effectiveness of various treatment models for homeless families.  Although the income tax experiments of the 1970s used a fractional factorial design, few evaluations have followed that lead.  Now funders have a renewed interest in factorial designs in order to help answer some of those “black box” questions (Solmeyer & Constance, 2015).

On the analysis front, substantial methodological advancements capitalize on experimental data to help inform what’s inside that black box:  what is it about programs and their participants that drive program impacts?  Pushing propensity score methods, instrumental variable estimation and principal stratification-based analyses is increasing what evaluators have in their toolkit (find more detail here and here).

Rad Resources:

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

 

·

Keith Zvoch here. I am an Associate Professor at the University of Oregon. In this post, I would like to discuss regression discontinuity (RD) and interrupted time series (ITS) designs, two strong and practical alternatives to the randomized control trial (RCT).

Cool Trick: Take Advantage of Naturally Occurring Design Contexts

Evaluators are often charged with investigating programs or policy in situations where need-based assignment to conditions is required. In these contexts, separation of program effects from the background, maturational, and motivational characteristics of program participants is challenging. However, if performance on a preprogram measure allocates program services (RD), or if repeated measurement of an outcome exists prior to and contiguous with the intervention (ITS), then evaluators can draw on associated design frameworks to strengthen inference regarding program impact.

Lessons Learned: Design Practicality and Strength vs. Analytic Rigor

RD designs derive strength from knowledge of the selection mechanism used to assign individuals to treatment, whereas ITS designs leverage the timing of a change in policy or practice to facilitate rigorous comparison of adjacent developmental trends. Although procedurally distinct, the designs are conceptually similar in that a specific point along a continuum serves as a basis for the counterfactual. In RD designs, effects are revealed when a discontinuity in a score linking assignment and outcome exists at a cutpoint used to allocate program services. In ITS designs, effects are identified when the level or slope of the intervention time series deviates from the pre-intervention time series.

RD and ITS designs are particularly appropriate for program evaluators as they are minimally intrusive and are consistent with the need based provisioning of limited resources often found in applied service contexts. Nonetheless, it should be noted that strength of inference in both designs depends on treatment compliance, the absence of a spurious relationship coincidental with the cutpoint, and statistical conclusion validity in modeling the functional form of pre-post relationships. In many cases, inferential strength can be further enhanced by incorporating other naturally occurring design elements (e.g., multiple cutpoints, multiple treatment replications) or by drawing on administrative datasets to construct additional comparison or control groups.

The need for more extensive data collection and sensitivity analyses may of course present a nontrivial challenge in some evaluation contexts, but when considered relative to the practical and ethical difficulties that often surround implementation of a field-based RCT, an increase in analytic rigor will often prove an acceptable trade-off.

Hot Tip: Follow Within Study Design research!  WSD involves head-to-head comparisons of experimental and quasi-experimental designs, including the conditions under which selected quasi-experimental evaluation designs can replicate experimental results.

Rad Resource: This article explores how to use additional design elements to improve inference from RD designs.

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· · · ·

Hello! I’m William Faulkner, Director of Flux, an M&E consultancy based in New Orleans. I want to pull back the curtain on perhaps the most famous experiment in international development history – the one conducted by Washington DC’s IFPRI on Mexico’s largest anti-poverty program, PROGRESA (now Prospera).

 The Down-Low:

Basically, the mainstream narrative of this evaluation ignores three things:

  • Randomization: This evaluation did not randomly assign households to treatment and control status, but only leveraged randomization. The “clustered matched-pairs design” involved non-randomly assigning participating communities first to treatment and control status.
  • Attrition: Selective sample attrition was strongly present and unaccounted for in analyses.
  • Contamination: Treatment communities were doubtless ‘contaminated’ with migrants from control communities. They even decided to end the project early because of pressure from local authorities in control communities.

The project that “proved” that experiments could be rigorously applied in developing country contexts was neither experimental nor rigorous. In other words, a blind eye was turned to the project’s pretty severe internal weaknesses. There was an enormous and delicate opportunity to de-politicize the image of social programming at the national level, put wind in the sails of conditional cash transfers, bolster the credibility of evidence-based policy worldwide, and sustain the direct flow of cash to poor Mexicans.

So What? (Lessons Learned):

What does this case illuminate about experiments?

Let’s leave the shouting behind and get down to brass tacks. The “experiment-as-gold-standard” agenda still commands a significant swath of the networks which commission and undertake evaluations. Claims for methodological pluralism, however, are neither new nor in need of immediate defense. Instead, M&E professionals should systematically target and correct the overzealous representations of experiments, rather than getting bogged down in theoretical discussions about what experiments can and cannot do.

Still, in 2016, we have breathless, infomercial-like articles on experiments coming out in the New York Times. This has to stop. Still, we absolutely must respect the admirable achievement of the randomistas: the fact that there’s a space for this fascinatingly influential methodology in M&E where previously none existed.

The individuality of each evaluation project makes talking about ‘experiments’ as a whole difficult. These things are neither packaged nor predictable. As with IFPRI-Progresa, micro-decisions matter. Context matters. History matters. This case is an ideal centerpiece with which to induce a grounded, fruitful discussion of the rewards and risks of experimental evaluation.

Rad Resources:

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

·

I’m Laura Peck, recovering professor and now full-time evaluator with Abt Associates.  For many years I taught graduate Research Methods and Program Evaluation courses. One part I enjoyed most was introducing students to the concepts of causality, internal validity and the counterfactual – summarized here as hot tips.

Hot Tips:

#1:  What is causality?

Correlation is not causation.  For an intervention to cause a change in outcomes, the two must be associated and the intervention must temporally precede the change in outcomes.  These two criteria are necessary.  The sufficient criterion is that no other plausible, rival explanations can take credit for the change in outcomes.

#2: What is internal validity?  And why is it threatened?

In evaluation parlance, these “plausible rival explanations” are known as “threats to internal validity.”  Internal validity refers to an evaluation design’s ability to establish that causal connection between intervention and impact.  As such, the threats to internal validity are those factors in the world that might explain a change in outcomes that you think your program achieved independently.  For example, children mature and learn simply by exposure to the world, so how much of an improvement in their reading is due to your tutoring program as opposed to their other experiences and maturation processes?  Another example is job training that assists unemployed people:  one cannot be any less employed than being unemployed, and so “regression to the mean” implies that some people will improve (get jobs) regardless of the training.  These two “plausible rival explanations” are known as the “threats to validity” of maturation and regression artifact.  Along with selection bias and historical explanations (recession, election, national mood swings), these can claim credit for changes in outcomes observed in the world, regardless of what interventions try to do to improve conditions.

#3: Why I stopped worrying and learned to love the counterfactual.

I want interventions to be able to take credit for improving outcomes, when in fact they do.  That is why I like randomization.  Randomizing individuals or classes or schools or cities to gain access to an intervention—and randomizing some not to gain access—provides a reliable “counterfactual.”  In evaluation parlance, the “counterfactual” is what would have happened in the absence of the intervention.  Having a group that is randomized out (e.g., to experience business as usual) means that it experiences all the historical, selection, regression-to-the-mean, and maturation forces as do those who are randomized in.  As such, the difference between the two groups’ outcomes represents the program’s impact.

Challenge:

As a professor, I would challenge my students to use the word “counterfactual” at social gatherings.  Try it!  You’ll be the life of the party.

Rad Resource:

For additional elaboration on these points, please read my Why Randomize? Primer.

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· · ·

Older posts >>

Archives

To top