Welcome to aea365! Please take a moment to review our new community guidelines. Learn More.

Tech TIG Week: Harnessing Machine Learning for Ethical and Effective Evaluation by Peter York

Hello! I’m Pete York, an evaluator for more than 25 years, and I’ve been integrating machine learning and AI into evaluation for the past decade. I teach courses introducing machine learning to evaluators at The Evaluator’s Institute (TEI) and the International Program for Development Evaluation Training (IPDET).

Hot Tip

As demand for big data science, machine learning, and AI in evaluation grows, evaluators must understand how to use these tools ethically and effectively.

More specifically, there are four essential learning objectives for evaluators to integrate machine learning, which will allow for much faster and more cost-effective rigorous evaluations of social programs:

  1. Learn how to leverage program administrative data: Use existing program administration data from medium- to large-scale programs to conduct quasi-experimental causal evaluations, where machine learning (ML) algorithms can assist with producing real-time findings and on-demand case-specific recommendations for frontline workers. By the way, ML algorithms can analyze qualitative and quantitative data in one model, so we can finally analyze case notes alongside program dosage, transactional, and assessment data.
  2. Train machine learning algorithms for causal modeling: Avoid the “correlation is not causation” pitfall by training algorithms to find naturally occurring counterfactual experiments in historical program data while controlling for selection bias. Do this by training ML to find contextually matched cases before determining and evaluating what works.
  3. Minimize social biases: ML algorithms control for selection bias by finding and matching similar cases and counterfactually discovering what works for each group. Then they must be trained to evaluate whether everyone has equal access to what works. Specifically, ML needs to be trained to assess whether all cases within a matched group have equal access to and receive effective interventions, regardless of factors like race, gender, and sexual orientation.
  4. Only Use Transparent Algorithms: Many different types of machine learning algorithms exist. Some algorithms are considered “black box” algorithms, like neural networks and deep learning algorithms, because we cannot explain how they draw their predictive conclusions. They aren’t transparent. But, there are very effective ML algorithms—such as decision tree and random forest algorithms—where evaluators can see how each decision of the algorithm is made. This allows for us to be able to interrogate, and even change the parameters of the algorithm, if it doesn’t comport with our evidence-based research or social or professional ethics or values.

Lessons Learned

Evaluators face significant hurdles in adapting to data science paradigms, particularly Bayesian analytics. Unlike our frequentist approach, Bayesian theory, as applied by data science, often prioritizes predictive accuracy over causal relationships. This focus can inadvertently perpetuate inequities embedded in historical data.
So, evaluators must lead the integration of machine learning into our field, ensuring that we:

  1. Maintain our focus on rigorous research design and the value of causal inference.
  2. Leverage our experience in identifying and mitigating bias.
  3. Apply our expertise in outcome measurement and program theory.
  4. Uphold our commitment to ethical practices and social impact.

By taking the lead, we can harness the power of machine learning to provide more rigorous, equitable, and actionable insights while staying true to the principles of good evaluation practice.

Rad Resources

For learning more about and experimenting with ML for evaluation:

  1. The Book of Why: The New Science of Cause and Effect, by Judea Pearl and Dana Mackenzie.
  2. KNIME (https://www.knime.com) is a free, user-friendly, no-code platform for data analytics and machine learning that allows evaluators to build complex models.
  3. Measuring results and impact in the age of big data: The nexus of evaluation, analytics, and digital technology, by Pete York & Michael Bamberger.

The American Evaluation Association is hosting Integrating Technology into Evaluation TIG Week with our colleagues in the Integrating Technology into Evaluation Topical Interest Group. The contributions all this week to AEA365 come from ITE TIG members. Do you have questions, concerns, kudos, or content to extend this AEA365 contribution? Please add them in the comments section for this post on the AEA365 webpage so that we may enrich our community of practice. Would you like to submit an AEA365 Tip? Please send a note of interest to AEA365@eval.org. AEA365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.