Hello! We are Alex, John, and Jenica, data science and evaluation specialists with Deloitte Consulting, LLP. We review large sets of qualitative data on a regular basis, and to save time and pull insights we may miss during a manual review, we turn to automation. With recent advances and increased access to automated text analytics solutions, it is easier than ever to process qualitative information using artificial intelligence. Here is some advice on getting started.
Hot Tip #1:
Widen your view on what text analytics can do. Like many others, we often use automated text analytics to develop topic models or classify documents into themes. However, virtually every large-scale evaluation, summarization, or classification of a corpus of unstructured text could be enhanced with automation.
For example, natural language processing and AI can perform an initial sort to identify articles most likely to be relevant for a systematic literature review. You can learn more from John’s guide about how machine learning can overcome the problem of scale, while maintaining rigor.
Hot Tip #2:
Write out your analysis strategy before collecting text data. One of the most frustrating moments is having collected your data only to realize that, because you didn’t ask the question in the right way, the data is muddled, unreliable, or invalid. Before collecting new data, it is essential to think through the details of how data will be used. Here are some items we consider:
- Confirm that the question is clear and on only one subject. Compound or double-barreled questions can be difficult for an algorithm to parse. Consider the prompt: “Do you like your job? Why or why not?” If changing the first into a close-ended question, your text algorithm can focus on the ‘dirty work’ of pulling out common topics discussed within each segment of the sample.
- Confirm that the question is consistent with your analytical goals. A question optimized for topic modeling may look different from a question optimized for sentiment analysis. There is not an approach to writing questions that is applicable for all cases.
- Take time to understand the text analytics algorithms that you will employ. Different algorithms work in different ways and your question wording should be informed by the algorithm you plan to use.Word questions in a way that elicits the right vocabulary from the respondent so that responses can be differentiated by topic.
Hot Tip #3:
Understand the settings and options at your disposal. When developing a new topic model, we always run our training data under a variety of settings and options to see what produces the most out-of-the-box usable results. There may be more options at your disposal than you realize – depending on what you are doing, these could include the number of topics, the list of stop words, the tokenization and lemmatization algorithm, or part-of-speech tagging. Changing a few settings can lead to very different outputs – see what works and adjust from there.
Hot Tip #4:
Remember to save time for iteration and manual tweaking. Even with advances in technology, it is rare that the output from a text analytics program will be exactly what you need. Build in time to review results. Terms may need to be added or removed – or weights adjusted – within each topic. Don’t expect the algorithm to do all the work, as you will likely need to iterate and add in some of your ‘human knowledge’ to the output.
Have you automated qualitative text analytics? Share your lessons learned in the comment box below.
The American Evaluation Association is hosting Health Evaluation TIG Week with our colleagues in Health Evaluation Topical Interest Group. The contributions all this week to AEA365 come from our HE TIG members. Do you have questions, concerns, kudos, or content to extend this AEA365 contribution? Please add them in the comments section for this post on the AEA365 webpage so that we may enrich our community of practice. Would you like to submit an AEA365 Tip? Please send a note of interest to AEA365@eval.org. AEA365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.