Welcome to aea365! Please take a moment to review our new community guidelines. Learn More.

What’s next for Emerging AI in Evaluation? Takeaways from the 2023 AEA Conference by Zach Tilton and Linda Raftree

Hello, AEA365 community! Liz DiLuzio here, Lead Curator of the blog. This week is Individuals Week, which means we take a break from our themed weeks and spotlight the Hot Tips, Cool Tricks, Rad Resources and Lessons Learned from any evaluator interested in sharing. Would you like to contribute to future individuals weeks? Email me at AEA365@eval.org with an idea or a draft and we will make it happen.


Hello, we’re Zach Tilton—tech-enabled evaluation consultant and PhD candidate at Western Michigan University, and Linda Raftree—founder of The MERL Tech Initiative and Natural Language Processing Community of Practice lead. We’ve followed the trajectory of MERL Tech (tech-enabled monitoring, evaluation, research and learning) for a decade, so we were excited and overwhelmed by the buzz around artificial intelligence at Eval23. Here’s a summary of our longer blog post on key takeaways from the conference. 

Lessons Learned

Demand for guidance on AI-enabled evaluation at the AEA was high. The AEA Conference was abuzz with interest in AI’s role in evaluation – evidenced by standing room only and out-the-door sessions on generative AI. 

Concerns Around AI in Evaluation were evident. Many questions from attendees focused on ethical dimensions of AI, potential biases, and data privacy. Questions like, “What does it mean if we no longer ‘swim’ in the data?” chime with the observation that the more practitioners outsource their craft, the more alienated they become from it. 

We don’t really know yet what emerging AI can and can’t (or shouldn’t!) do for evaluation. While emerging evidence suggests there are gains in efficiency and quality for some tasks, the frontier of AI-enabled evaluation has a jagged edge, meaning not all tasks are well suited for AI integration. 

Some Emerging Conclusions

GenAI is more than vaporware. Despite the hype that the current wave of AI shares with blockchain and Web3, generative AI does not seem as ephemeral. MERL Tech oracle Michael Bamberger suggests ignoring AI may lead to a widening problematic gap between data scientists and evaluators.

Many organizations will rush to build AI-enabled evaluation machines. Attempting to ride the AI wave and not be washed out by it may lead evaluation units to further entrench their organizational ‘evaluation machines.’ Strengthening automated surveillance and data concentration could lead to further alienation of evaluators from their craft. 

We need to define research and upskilling agendas. The research on evaluation (RoE) community is starting to pay attention to how disruptive AI may be; e.g., work from the ICRC, World Bank, and the latest NDE special issue on AI in Evaluation. Ongoing, adaptive research is needed considering how quickly AI evolves. 

Hot Tips

Work now to future-proof your and our evaluation practice. Instead of saying all evaluators should uncritically adopt AI tools, evaluators should consider how AI and the fourth industrial revolution may alter the evaluation landscape. What does human intelligence have to offer in evaluation that artificial intelligence can’t? How will AI require revising evaluation specific methodologies, competencies, and guiding principles, if at all? 

Avoid “theory free” AI-enabled evaluation. Though it is still debatable whether subject-matter theory in evaluation is a luxury, in order to live up to the refrain heard at AEA23 to “Interrogate the technology and interrogate the data” we need a fundamental theory or understanding of how these technologies function so we don’t overextend AI’s role in our assisted sensemaking practice. 

An emerging RoE agenda of AI and evaluation. To what extent can evaluation processes and products be optimized with AI? How far can (and should) AI models go to create evaluative judgments? What meaningful ways can communities, participants, and intended users of evaluation be involved when AI is being used? What AI-related skills do experienced professionals, institutional decision-makers, and evaluation commissioners need to have? What else should evaluators know about AI?

Rad Resource

Join the MERL Tech Natural Language Processing Community of Practice for access to resources, regular webinars, newsletters, and a Slack workspace with hundreds of AI-curious evaluation professionals. 


Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.