I’m Paula Fearon, Co-Founder of the Analytics Research Institute, and together with my colleagues Heather Eshleman, Ami Shah, and Leigh Firestone Brooks, we are pleased to kick-off the RTD TIG week on AEA 365. Our day-to-day work involves producing a myriad of analyses using all types of datasets. To ensure our work is both high quality and reproducible we have been developing a series of checklists and templates. Today we want to share one of our worksheets with you as a Rad Resource – we hope you find it as useful as we do!
Evaluations include a laundry list of processes and procedures to ensure that any conclusions being drawn are truly supported by data. A key part of this process, especially in the research, technology, and development domain, is rigorous analysis of large, complex datasets. However, data changes rapidly, comes from a multitude of sources, and is often modified, transformed, or outright changed. So how do you keep track of it all?
At the Analytics Research Institute (ARI), we use a standard data provenance worksheet for each data file to document its purpose, source, modifications and more. (Find our template here!)
Why is a data provenance worksheet important for our work?
Easy to incorporate into a data file, but can be maintained separately.
This worksheet is something that can be included in your data file as an easy-to-access reference for anyone viewing the data. It can also be a helpful tool for an individual evaluator to maintain a record of their analysis methods.
Keeping track of all the little details.
Raw data often needs to be modified for analyses, sometimes in complicated ways. Different methods may need to be tested before settling on one. By generating a data provenance worksheet for each pipeline, you can keep track of what worked and didn’t work, and repeat the process without forgetting a step.
Improved communication.
It is important for teams, large and small, to stay in sync when performing complicated analyses with large datasets. This is especially the case when project information is frequently changing. Having a shared document that the entire team can access and monitor will cut down on miscommunications and inconsistencies.
Ensures reproducibility, verifiability, credibility, and identification.
Every analysis you and your team tackle should be reproducible, verifiable, credible, and identifiable. The data provenance worksheet enables you to achieve this by ensuring the processes of data collection and analysis are transparent and accessible.
A data provenance worksheet encourages evaluators to accurately and contemporaneously capture information about data being used in active evaluations and analyses. It creates a baseline for teams to get on the same page, and can easily be modified to suit the requirements of an analysis or the working style of the evaluation team. Easily ensure the reproducibility, verifiability, credibility, authentication of your current and future analysis with this worksheet.
The American Evaluation Association is hosting Research, Technology and Development (RTD) TIG Week with our colleagues in the Research, Technology and Development Topical Interest Group. The contributions all this week to AEA365 come from our RTD TIGmembers. Do you have questions, concerns, kudos, or content to extend this AEA365 contribution? Please add them in the comments section for this post on the AEA365 webpage so that we may enrich our community of practice. Would you like to submit an AEA365 Tip? Please send a note of interest to AEA365@eval.org. AEA365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.