Lisa R. Holliday on Using Data Dictionaries to Improve Data Quality

My name is Lisa R. Holliday, and I am a Data Architect with The Evaluation Group. One aspect of my job is to work with evaluation teams to create data collection tools and assess the quality of data received. I use data dictionaries as a way to establish quality expectations.

Hot Tip 1: Create Data Dictionaries for All Projects

Data dictionaries are used with databases as a way to maintain records about the data being utilized, including type, format, and information use. Data dictionaries can also be used for evaluations.

I include the following areas in evaluation data dictionaries:

  1. Data Name
  2. Data Description
  3. Data Type
  4. Data Format
  5. Precision
  6. Acceptable Values
  7. Data Collection Cycle
  8. Data Collector Responsible

Hot Tip 2: Define What’s Acceptable

Be as specific as possible when it comes to what is acceptable. What is the data type (numeric, text, date)? How should data be formatted? For example, should dates be entered as mm/dd/yy or dd/mm/yyyy? For numeric data, how many decimal points do you want? If collecting names, what are acceptable values (first and last name, or first initial last name)? Also, are there any data that are required, such as identification numbers?

Once you know what you want your data to look like, determine the minimum amount of non-conformance you can accept. For example, you might establish a threshold that 85% of data submitted must be formatted correctly, and 100% of required data must be submitted.

Hot Tip 3: Enforce Data Quality Expectations

Use your data dictionary to create your data collection tools. Enforce data type, format, precision, and acceptable values where possible. Also, provide instructions for data entry and training to data collectors.

Hot Tip 4: Profile your Data Regularly

At least twice a year, profile your data: how well do the data you collect align with the rules you specified? If you find that the data you are receiving aren’t meeting your expectations, consider modifications to the data collection tools you are using or the use of another collection method. Also, provide follow-up training to data collectors.

Rad Resource: Data Cleaner

This is a free data profiling tool that works with a variety of data sources, including MS Access and MS Excel, as well as numerous relational database management systems.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.