I’m Linda Raftree and I founded the MERL Tech community about ten years ago, back when monitoring, evaluation, research and learning (MERL) practitioners were just starting to explore mobile phones, digital devices, and digital data for MERL purposes. Technology has changed a lot since 2013 – and so have the ways that MERL practitioners use it for their work!
In 2023, large language models (LLMs) and generative artificial intelligence (AI) exploded into the mainstream with tools like ChatGPT. Imagine having a digital version of yourself, a clone containing all your accumulated knowledge and writing style, instantly accessible with just a click. Envision the convenience of requesting a bot to summarize hundreds of pages of text within a matter of seconds. These are only some of the things that are becoming more possible within the field of Natural Language Processing (NLP).
NLP and MERL: Practical and Societal Benefits and Challenges
NLP promises to improve how we do our work on the one hand, while also presenting multiple challenges and the potential for harm on the other. Identifying which MERL tasks align best with NLP capabilities requires careful consideration, so that technologies don’t create more harm than benefit.
There are two basic levels where MERL practitioners should be engaging in discussion on NLP:
- Practical issues of applying NLPs to MERL; and
- Ethical issues that are emerging with the uptake of NLP.
Practical challenges relate to aspects such as factual accuracy and bias. Models are trained on the Internet, which is skewed towards English-speaking people and cultures. Commercial models are also known for confidently serving up errors, termed “hallucinations.” LLMs are more prone to hallucinations in non-dominant languages and cultures.
Examples of societal challenges include the impact of outsourcing annotation tasks for potentially harmful content to low-paid workers who often experience vicarious trauma and mental health issues due to their exposure to this content. Environmental costs of LLM development are also considerable, with the training of models like GPT-3 consuming large amounts of clean freshwater in data centers.
Additionally, epistemic harm arises from the exclusion of less dominant languages, cultures, and belief systems. This may contribute to the erosion of diverse ways of knowing, understanding, and experiencing the world, primarily due to the embedded harmful global power differentials in technology, global development and labor markets, and systems of knowledge production.
Importance of Collaboration as Evaluators
These and other emerging concerns need to be addressed if NLP is to be used safely, ethically, and responsibly in our work. MERL practitioners need to better understand how artificial intelligence and NLP function. We need to know where to look for risks, harms, biases and other types of rights violations so that we can consider if and how to use these tools.
Get Involved
To that end, we have created an NLP Community of Practice (NLP-CoP) where together, MERL practitioners and data scientists are working to better understand what NLP technologies can do for MERL and how to use these technologies ethically and responsibly, especially in Global South contexts. As part of the CoP’s work, we’ve contributed to an upcoming edition of New Directions for Evaluation where we lay out in more detail the potential and harms of LLMs.
The work is only beginning, as LLMs are poised to permeate every part of our lives. We welcome additional perspectives in the NLP-CoP as we dig into how it is impacting MERL. NLP-CoP members will be presenting at various sessions at the AEA in October, and we’ll also host an informal lunch meeting together with Accountable Now and the ITE-TIG. Sign up here if you’d like to join us!
The American Evaluation Association is hostingIntegrating Technology into Evaluation TIG Week with our colleagues in the Integrating Technology into Evaluation Topical Interest Group. The contributions all this week to AEA365 come from ITE TIG members. Do you have questions, concerns, kudos, or content to extend this AEA365 contribution? Please add them in the comments section for this post on the AEA365 webpage so that we may enrich our community of practice. Would you like to submit an AEA365 Tip? Please send a note of interest to AEA365@eval.org. AEA365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.