LAWG Week: Joseph Gasper on Propensity Score Matching for “Real World” Program Evaluation

Welcome to the Evaluation 2013 Conference Local Arrangements Working Group (LAWG) week on aea365. I am Joseph Gasper, and I am a Senior Study Director with Westat, an employee-owned research corporation in Rockville, Maryland.

Westat was hired by New York City’s Center for Economic Opportunity (CEO) to evaluate approximately 35 programs to move economically disadvantaged New Yorkers out of poverty. Several of the evaluations used propensity score matching (PSM) to estimate program impacts, a strategy that is increasingly used when random assignment is impractical or unethical.

While the benefits of PSM have been widely touted, little discussion has been given to the challenges involved in a thoughtful implementation of this method in “real world” evaluation. This blog discusses a few of these challenges based on my experiences.

Lessons Learned:

  • PSM not a substitute for a carefully selected comparison group. It is necessary to identify a comparison group of individuals who are eligible but did not participate and are similar enough to participants to find matches. PSM can be used to further refine the comparison group so that it is similar to the treatment group on as many relevant characteristics as possible. If the comparison group is poorly chosen, there will be few good matches for participants.
  • Program data are often insufficient for PSM. Program data do not usually include information on attitudes and motivations that lead individuals to self-select (or be selected) into programs. If a program serves those most in need, too few variables on which to match may lead to a program group that has worse outcomes than the comparison group because they “worse off” to begin with. Such a finding could lead to the erroneous conclusion that the program is harmful!

Hot Tip (for the more technically-oriented):

  • Choose a matching methodology that will allow for subgroup analysis. Program staff often want to know whether their programs are effective for subgroups of participants. Matching must be done separately for each subgroup to ensure that members of each matched pair are the same on the subgroup variable, which can be time consuming. A more efficient approach is to force matches on the subgroup variables (exact matching) and then match on the propensity score.

Rad Resource: The Mayo Clinic maintains several SAS macros to perform PSM. The GREEDY macro can perform exact matching and then match on the propensity score:

Hot Tip—Insider’s advice for Evaluation 2013 in DC: If you’re in the mood for “American style” Chinese fare close to the conference, you should check out Meiwah Restaurant. This is also a great place to spot politicians, members of the media, and other beltway celebrities.


We’re thinking forward to October and the Evaluation 2013 annual conference all this week with our colleagues in the Local Arrangements Working Group (LAWG). AEA is accepting proposals to present at Evaluation 2013 through until March 15 via the conference website. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice.

2 thoughts on “LAWG Week: Joseph Gasper on Propensity Score Matching for “Real World” Program Evaluation”

  1. Michalopoulos, 2004, ‘Can Propensity-Score Methods Match the Findings from a Random Assignment Evaluation of Mandatory Welfare-to-Work Programs?’ The Review of Economics and Statistics, Vol. 86, No. 1, Pages 156-179. (the answer: occasionally, but not consistently)

  2. I use the Stata stat package and have used Becker & Ichino’s “pscore” module. See their 2002 article in the Stata Journal (pages 358-377).

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.