Experiments TIG Week: Biggest Complaints: Experiments have limited external validity, take too long, and cost too much by Laura Peck

Welcome to the final installment of the Design & Analysis of Experiments TIG-sponsored week of AEA365. It’s Laura Peck of Abt Associates, here again to address some complaints about experiments.

Experiments have limited external validity

Experimental evaluation designs are often thought to trade internal validity (ability to claim cause-and-effect between program and impact) with external validity (ability to generalize results). Although plenty of experiments do limit generalizing to their sample, there is good news from the field. Recent scholarship reveals techniques—retrospective analyses and prospective planning—that can improve generalizability. You can read more these advances in recent articles, here, here, and here.

Experiments take too long

Experimental evaluations have a bad reputation for taking too long. Certainly there are some evaluations that track long-term outcomes and, by definition, must take a long time. That may be a criticism of any evaluation charged with considering long-term effects. A recent push within the government is challenging the view that experiments take too long: the White House Social and Behavioral Sciences Team is helping government identify “nudge” experiments that involve tweaking processes and influencing small behaviors to affect short-term outcomes. It is my hope that these efforts will improve our collective ability to carry out faster experimental research and extend the method to other processes and outcomes of interest.

Another reason experiments may take a long time is that enrolling a study sample takes time. This depends on specific program circumstances, and it does not necessarily need to be the case. For example, the first round of the Benefit Offset National Demonstration enrolled about 80,000 treatment individuals into its evaluation at one time, with the treatment group getting a notification letter of the new program rules. Such a change can be associated with large sample build up in a very short time.

Experiments cost too much

A rule of thumb is that evaluation should comprise one-tenth of a program budget. So, for a program that costs $3 million per year, $300,000 should be invested in its evaluation. If the evaluation shows that the program is ineffective, then society will have spent $300,000 to save $3 million per year in perpetuity. Efforts are underway to ensure that low-cost experiments become feasible in many fields, such as using administrative data, including integrating data from systems across agencies.

The Bottom Line

Experimental evaluations need not be more time-consuming or costly than other kinds of impact evaluation; and the future is bright for experimental evaluations to meet high standards regarding external validity.

This week’s-worth of posts shows that the many critiques of experiments are not damning when carefully scrutinized, thanks to recent methodological advances in the evaluation field.

Rad Resource:

For additional detail on today’s criticisms of experiments and others that this week-long blog considers, please read On the Feasibility of Extending Social Experiments to Wider Applications.

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Dale Button

December 1, 2017 at 10:25 am

Thank you for elaborating on these important topics. Your points, especially regarding time and cost of experimentation in program investigation, highlight core challenges in incorporating evidence based research methodology into program evaluation.

Interestingly, your first point is the one that surprises me. With the prominence of theory- and utilization-driven evaluation, I would not expect external validity to be considered a systemic concern in program evaluation. These context specific approaches dictate specificity of the evaluation and the evaluation methods to the given program. As result, generalizability to populations beyond those targeted by the program would be necessarily decreased. It seems as though the question that arises from this concern is whether the goals of a program evaluation are based on practical or academic principles of experimental finding use.

In terms of program evaluation, I would argue that the practical approach has greater relevance to the program. Generalizability has a great role in making academic assumptions regarding the interventions ability to be applied to a greater population. However, if this population is not targeted by the organizations program, is strictly controlling methodology to protect external validity cost and time effective?

An alternative to the top-down approach where validity is of highest concern is a bottom-down approach. In this approach, rather than emphasizing validity from experiment initiation, viability of the program is first evaluated to determine if, and which, tests of validity are warranted (Chen, 2010). From here researchers can build methodology to support internal and external validity as it applies to program context and viability. Importantly, rather than attempting to produce highly rigorous methods of ensuring external validity in an academic sense, evaluation generalization is tailored to the target population of the program.

With infinite resources and time, the experiments performed in program evaluation could provide results with high internal and external validity. However, if programs are concerned with the cost and time of evaluation, is it reasonable to expect significant external validity? Can this focus actually detriment the practicality of the program evaluation itself? And is the production of these results worth the resources required to produce them?

Dale Button

References:

Chen, H. T. (2010). The bottom-up approach to integrative validity: A new perspective for program evaluation. Evaluation and Program Planning, 33(3), 205-214.

Posts Related to AEA Topical Interest Groups

2 thoughts on “Experiments TIG Week: Biggest Complaints: Experiments have limited external validity, take too long, and cost too much by Laura Peck”

Dale Button
December 1, 2017 at 10:25 am

Thank you for elaborating on these important topics. Your points, especially regarding time and cost of experimentation in program investigation, highlight core challenges in incorporating evidence based research methodology into program evaluation.

Interestingly, your first point is the one that surprises me. With the prominence of theory- and utilization-driven evaluation, I would not expect external validity to be considered a systemic concern in program evaluation. These context specific approaches dictate specificity of the evaluation and the evaluation methods to the given program. As result, generalizability to populations beyond those targeted by the program would be necessarily decreased. It seems as though the question that arises from this concern is whether the goals of a program evaluation are based on practical or academic principles of experimental finding use.

In terms of program evaluation, I would argue that the practical approach has greater relevance to the program. Generalizability has a great role in making academic assumptions regarding the interventions ability to be applied to a greater population. However, if this population is not targeted by the organizations program, is strictly controlling methodology to protect external validity cost and time effective?

An alternative to the top-down approach where validity is of highest concern is a bottom-down approach. In this approach, rather than emphasizing validity from experiment initiation, viability of the program is first evaluated to determine if, and which, tests of validity are warranted (Chen, 2010). From here researchers can build methodology to support internal and external validity as it applies to program context and viability. Importantly, rather than attempting to produce highly rigorous methods of ensuring external validity in an academic sense, evaluation generalization is tailored to the target population of the program.

With infinite resources and time, the experiments performed in program evaluation could provide results with high internal and external validity. However, if programs are concerned with the cost and time of evaluation, is it reasonable to expect significant external validity? Can this focus actually detriment the practicality of the program evaluation itself? And is the production of these results worth the resources required to produce them?

Dale Button

References:

Chen, H. T. (2010). The bottom-up approach to integrative validity: A new perspective for program evaluation. Evaluation and Program Planning, 33(3), 205-214.

Bernadette Wright
January 15, 2017 at 5:37 pm

$300,000 is a lot to spend to learn that a program was ineffective and not learn what would work! Too much money to spend on too few answers.

2 thoughts on “Experiments TIG Week: Biggest Complaints: Experiments have limited external validity, take too long, and cost too much by Laura Peck”

Leave a Comment Cancel Reply