Happy weekend, data aficionados! My name is Elizabeth DiLuzio, volunteer curator for AEA365, internal evaluator at Good Shepherd Services, and instructor of Quantitative Methods in Evaluation at New York University. This week, one of our professional nightmares became a reality at Public Health England (PHE): they failed to properly contact trace nearly 16,000 positive COVID cases due to the improper use of Microsoft Excel. While I don’t want to oversimplify the engineering it takes to track an entire country’s outbreak, I do want to use today’s blog to talk about how we can avoid a similar situation we regularly encounter. It lies in our ability to recognize when it’s time to change our data storage technology from a spreadsheet to a database.
Let me begin by saying that Microsoft Excel and Google Sheets are fabulous tools. Their easy-to-use interface makes storing and analyzing data convenient without the need to learn code. As such, they tend to be the go-to tool when we’re starting up any sort of data tracking system. And, I would argue that this is the best choice as long as:
- There are only a few people who need to access the data,
- There is a small amount of data,
- The data are relatively simple in their organizational structure, and
- The solution is not expected to be long term.
Once one or more of these conditions no longer apply, it’s time to consider moving to a database. Put simply, a database is a collection of spreadsheets. In a relational database, these tables of data are all connected to each other through a logical system designed to reduce duplication and minimize errors. The rules that govern databases create relationships between the data, allow data to be automatically updated, and make queries against the data.
A database is a common response to spreadsheet woes because:
- The data are stored on a server. This enables computers across a large network to access the data at the same time.
- The server’s memory is significantly larger than a computer’s. On its best day, a spreadsheet can hold one million rows of data. That number is significantly reduced if using an online spreadsheet or an older version of Excel. What’s more, Excel users will know all too well that its processing ability declines when it reaches hundreds of thousands of rows of data. This is not the case for databases.
- There are established relationships between data points. This enables a user to write more simple formulas, called queries, in order to run analyses off of the data. Goodbye, VLOOKUP!
- Databases are designed to keep data clean and secure. This is not the case for spreadsheets.
This blog is just a cursory introduction to what might trigger a move from a spreadsheet to a database. Take the time to learn the basics about databases so that, when budget and staffing permit, you are not tolerating the limitations of storing data in spreadsheets longer than you need to be.
Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to email@example.com. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.