AEA365 | A Tip-a-Day by and for Evaluators

TAG | validity

Welcome to the final installment of the Design & Analysis of Experiments TIG-sponsored week of AEA365.  It’s Laura Peck of Abt Associates, here again to address some complaints about experiments.

Experiments have limited external validity

Experimental evaluation designs are often thought to trade internal validity (ability to claim cause-and-effect between program and impact) with external validity (ability to generalize results).  Although plenty of experiments do limit generalizing to their sample, there is good news from the field. Recent scholarship reveals techniques—retrospective analyses and prospective planning—that can improve generalizability. You can read more these advances in recent articles, here, here, and here.

Experiments take too long

Experimental evaluations have a bad reputation for taking too long.  Certainly there are some evaluations that track long-term outcomes and, by definition, must take a long time. That may be a criticism of any evaluation charged with considering long-term effects.  A recent push within the government is challenging the view that experiments take too long: the White House Social and Behavioral Sciences Team is helping government identify “nudge” experiments that involve tweaking processes and influencing small behaviors to affect short-term outcomes.  It is my hope that these efforts will improve our collective ability to carry out faster experimental research and extend the method to other processes and outcomes of interest.

Another reason experiments may take a long time is that enrolling a study sample takes time.  This depends on specific program circumstances, and it does not necessarily need to be the case. For example, the first round of the Benefit Offset National Demonstration enrolled about 80,000 treatment individuals into its evaluation at one time, with the treatment group getting a notification letter of the new program rules.  Such a change can be associated with large sample build up in a very short time.

Experiments cost too much

A rule of thumb is that evaluation should comprise one-tenth of a program budget. So, for a program that costs $3 million per year, $300,000 should be invested in its evaluation.  If the evaluation shows that the program is ineffective, then society will have spent $300,000 to save $3 million per year in perpetuity.  Efforts are underway to ensure that low-cost experiments become feasible in many fields, such as using administrative data, including integrating data from systems across agencies.

The Bottom Line

Experimental evaluations need not be more time-consuming or costly than other kinds of impact evaluation; and the future is bright for experimental evaluations to meet high standards regarding external validity.

This week’s-worth of posts shows that the many critiques of experiments are not damning when carefully scrutinized, thanks to recent methodological advances in the evaluation field.

Rad Resource:

For additional detail on today’s criticisms of experiments and others that this week-long blog considers, please read On the Feasibility of Extending Social Experiments to Wider Applications.

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

·

I’m Laura Peck, recovering professor and now full-time evaluator with Abt Associates.  For many years I taught graduate Research Methods and Program Evaluation courses. One part I enjoyed most was introducing students to the concepts of causality, internal validity and the counterfactual – summarized here as hot tips.

Hot Tips:

#1:  What is causality?

Correlation is not causation.  For an intervention to cause a change in outcomes, the two must be associated and the intervention must temporally precede the change in outcomes.  These two criteria are necessary.  The sufficient criterion is that no other plausible, rival explanations can take credit for the change in outcomes.

#2: What is internal validity?  And why is it threatened?

In evaluation parlance, these “plausible rival explanations” are known as “threats to internal validity.”  Internal validity refers to an evaluation design’s ability to establish that causal connection between intervention and impact.  As such, the threats to internal validity are those factors in the world that might explain a change in outcomes that you think your program achieved independently.  For example, children mature and learn simply by exposure to the world, so how much of an improvement in their reading is due to your tutoring program as opposed to their other experiences and maturation processes?  Another example is job training that assists unemployed people:  one cannot be any less employed than being unemployed, and so “regression to the mean” implies that some people will improve (get jobs) regardless of the training.  These two “plausible rival explanations” are known as the “threats to validity” of maturation and regression artifact.  Along with selection bias and historical explanations (recession, election, national mood swings), these can claim credit for changes in outcomes observed in the world, regardless of what interventions try to do to improve conditions.

#3: Why I stopped worrying and learned to love the counterfactual.

I want interventions to be able to take credit for improving outcomes, when in fact they do.  That is why I like randomization.  Randomizing individuals or classes or schools or cities to gain access to an intervention—and randomizing some not to gain access—provides a reliable “counterfactual.”  In evaluation parlance, the “counterfactual” is what would have happened in the absence of the intervention.  Having a group that is randomized out (e.g., to experience business as usual) means that it experiences all the historical, selection, regression-to-the-mean, and maturation forces as do those who are randomized in.  As such, the difference between the two groups’ outcomes represents the program’s impact.

Challenge:

As a professor, I would challenge my students to use the word “counterfactual” at social gatherings.  Try it!  You’ll be the life of the party.

Rad Resource:

For additional elaboration on these points, please read my Why Randomize? Primer.

The American Evaluation Association is celebrating the Design & Analysis of Experiments TIG Week. The contributions all week come from Experiments TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· · ·

This is part of a series remembering and honoring evaluation pioneers leading up to Memorial Day in the USA on May 30.

My name is Mel Mark, a former AEA President and former editor of the American Journal of Evaluation. Don Campbell used pithy phrases to communicate complex philosophical or methodological issues. My favorite was: “Cousin to the amoeba, how can we know for certain?” This encapsulates his philosophy of science which informed his contributions to evaluation.

Pioneering and enduring contributions:

Campbell’s pioneering contributions included work on bias in social perception, intergroup stereotyping, visual illusion, measurement, research design and validity, and evaluation, which was at the center of his vision of “an experimenting society.” He believed in the evolution of knowledge through learning: “In science we are like sailors who must repair a rotting ship while it is afloat at sea. We depend on the relative soundness of all other planks while we replace a particularly weak one. Each of the planks we now depend on we will in turn have to replace. No one of them is a foundation, nor point of certainty, no one of them is incorrigible.”

Donald T. Campbell

Donald T. Campbell

Campbell’s work reminds us that every approach to evaluation is founded in epistemological assumptions and that being explicit about those assumptions, and their implications, is part of our responsibility as evaluators. Campbell wanted science, and evaluation, to keep the goal of truth, testing and inferring what is real in the world. But he acknowledged this goal as unattainable so “we accept a . . . surrogate goal of increasing coherence even if we regard this as merely our best available approximation of the truth.”

Campbell was an intellectual giant but disarmingly modest. He was gracious and helpful to students and colleagues, and equally gracious to his critics. His openness to criticism and self-criticism modeled his vision of a “mutually monitoring, disputatious community of scholars.” Those who knew Don Campbell know with all the certainty allowed to humans just how special he was.

Reference for quotations:

Mark, M. M.(1998). The Philosophy of science (and of life) of Donald T. Campbell,

American Journal of Evaluation, 19, 3: 399-402.

 Resources:

Bickman, L., Cook, T. D., Mark, M.M., Reichardt, C.S., Sechest, L., Shadish, W.R., & Trochim, W.M.K. (1998). Tributes to Donald T. Campbell. American Journal of Evaluation, 19(3): 397-426.

Brewer, M.B. & Collins, B.E. (Eds.) (1981) Scientific inquiry and the social sciences: a volume in honor of Donald T. Campbell. Jossey-Bass.

Campbell, D.T. (1994). Retrospective and prospective on program impact assessment. American Journal of Evaluation, 15 (3): 291-298.

Campbell, D.T. & Russo, J. (2001). Social measurement. Sage.

The American Evaluation Association is celebrating Memorial Week in Evaluation: Remembering and Honoring Evaluation’s Pioneers. The contributions this week are remembrances of evaluation pioneers who made enduring contributions to our field. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· ·

Archives

To top