Hi, this is Taj Carson, CEO at Inciter, a firm focused on helping clients change the world by using data. Many evaluators are caught flat-footed when they analyze data and run reports. Data are missing, they can’t find a client in the system because key information wasn’t entered, or the numbers just don’t add up. Having lot of data is not very helpful unless it’s clean and complete. Data systems and their design can be part of the problem. There are ways that the system you use to enter the data can help or harm the quality of your data. Here are some things to look for.
Lessons Learned:
Names Gone Rogue
Names are a favorite place for people to go rogue. You might have 10 people named Dawn Smith, but one is Dawnya Smith, one is Dawn E. Smith, one is Dawn Smyth. Then there is the whole issue of nicknames or street names. Now add in the fact that you may need to know their legal name (especially if you are integrating data from other systems) and their nickname (if you are providing direct services).
I’ll Just Leave This Here….
People in a hurry (and who isn’t?) might put placeholder data into a field. Birth date is required before you can continue? 01/01/2001 will do, right? I’ve got to get this entered! Social security numbers and birth dates are particularly at risk for this kind of shenanigans. People intend to go back and correct it but often don’t.
Good Enough for Government Work
Accuracy in data is important, as we all know. But data entry isn’t always accurate. Fields that are too flexible might allow someone to enter “5th and Main” instead of 125 Main Street into the address field. That may be fine for doing homeless outreach work (you actually want to identify a general location, not a building) but decide what you need and make sure it’s consistent.
Missing in Action
Obviously missing data are a big problem as well. Or are they? Sometimes you have lots of fields to store case management or other information, but it’s not needed or required for quality data analysis. Often only some fields are important for data aggregation. Be clear about which those are.
Hot Tips:
- Decide how you will store names, and consider making people look up an existing client first before entering new data about them.
- Make your system (whether it’s Excel, Access, or a Vendor Product) force users to enter the data correctly. Limit the number of characters in a field, create force fields that require essential information about a client before you can proceed, and use drop downs. Trying to write the city name into that zip code field? No go.
- Be clear about what data are required and what is optional, and why.
Rad Resource:
- Data Cleaning with Excel: https://exceljet.net/excel-data-validation-guide
- More Data Cleaning with Excel: https://www.howtoexcel.org/tips-and-tricks/11-awesome-examples-of-data-validation/
- Upping your Game to a Fancier Data Cleaning Tool: https://www.trifacta.com/download-trifacta-wrangler/ (
The American Evaluation Association is celebrating Getting Great Data Week. All posts this week are contributed by evaluators who came together to write about the theme of getting data that is accurate, timely and most of all useful. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.