Welcome to aea365! Please take a moment to review our new community guidelines. Learn More.

Cluster, Multi-Site, and Multi-Level Evaluation (CMME) TIG Week: Natural Language Processing and Evaluation by Pete Telaroli


Hi Folks, my name is Pete Telaroli, and I am a data translator and evaluator at Deloitte Consulting LLP. The focus of my work is primarily on public health, specifically disaster preparedness and response, which often includes multisite efforts. Recently I had the opportunity to conduct an evaluation that used traditional qualitative methods in combination with an advanced analytic technique known as natural language processing (NLP); the focus of the evaluation was to learn more about health equity in disaster preparedness and response. Our team used this evaluation to test how NLP performed and under what circumstances it might be successful in aiding evaluative studies. 

Before I explain some tips and tricks about NLP, it might be helpful to define NLP. I like this definition from Gartner the best: 

“Natural-language processing (NLP) technology involves the ability to turn text or audio speech into encoded, structured information, based on an appropriate ontology. The structured data may be used simply to classify a document, as in ‘this report describes a laparoscopic cholecystectomy,’ or it may be used to identify findings, procedures, medications, allergies and participants.” 

Our study used NLP so that we could identify promising response actions and vulnerable population groups that were being supported throughout the COVID-19 pandemic. Using NLP, our team took over 600 documents and turned their contents into structured analyzable data to extract those promising practices and identify key populations groups, such as racial and ethnic minoritized groups.  

We were fortunate to have robust data, and the NLP turned out to be a success. To help you out with your own evaluative studies and to determine if NLP could help, here are some tips and tricks: 

Hot Tips

Document Structure is Key: We were able to do this study because all of the documents had a similar structure and content. This made automating data extraction possible. While many tedious differences existed across documents, we saved many hours by automating data extraction. Before committing to NLP, you should check both the structure of your documents and volume of content that you have to understand if NLP is a viable solution.  

Do Your Prep Work: The technical pieces of NLP are important, but it’s critical to have done your research on your study question. NLP can work because you know what to search for in the unstructured data. We put together key word lists for each of our population groups and response actions that may be present in the documents so that we could train our model and identify them in all documents.  

Iterate and Test: Don’t be afraid to try different models. We tested three different models for our purposes, chose one, and then we continued to refine our results as we learned more about the output it was giving us.  

Interpret Results Carefully: Results, like from any statistical model, have their caveats. One of the outputs of our model was a score for how closely a sentence represented a response action. We had to decide where to cut off the score and to include only sentences that were above a certain score. We ended up choosing the median, but there were reasons to use the average, or even to include them all. 

Get Involved! Share in the comments about how you have used NLP processes in your evaluation efforts. We look forward to hearing from you!  


The American Evaluation Association is hosting the Cluster, Multi-Site, and Multi-Level Evaluation (CMME) TIG Week. The CMME TIG is encompasses methodologies and tools for designs that address single interventions implemented at multiple sites, multiple interventions implemented at different sites with shared goals, and the qualitative and statistical treatments of data for these designs, including meta-analyses, statistical treatment of nested data, and data reduction of qualitative data. The contributions all this week to AEA365 come from our CMME TIG members. Do you have questions, concerns, kudos, or content to extend this AEA365 contribution? Please add them in the comments section for this post on the AEA365 webpage so that we may enrich our community of practice. Would you like to submit an AEA365 Tip? Please send a note of interest to AEA365@eval.org. AEA365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.