Tech TIG Week: Understanding Sources of Bias in Big Data by Michael Bamberger

Greetings from Portland, Oregon! My name is Michael Bamberger and I am Program Chair for the Integrating Technology into Evaluation (ITE) TIG, where we have been following the exciting developments concerning the introduction of big data and Artificial Intelligence (AI) into research and evaluation. While sharing the excitement about these game-changing developments for the practice of evaluation, it is important to add a word of caution about potential sources of bias in how big data is generated and used.

One of the essential building blocks for all of these developments are the multiple sources of big data generated from social media posts, mobile phones, satellites and remote sensors, internet searches and from the huge volumes of administrative data compiled by government and private organizations. Much of this information comes from reprocessing data collected for a different purpose (e.g. social media posts were created to share information or opinions among a social network). Many evaluators, who traditionally relied on information collected through face-to-face interactions with individuals, groups or communities, have concerns about the credibility of relying on data that was originally collected for a different purpose, and where the researcher often does not fully understand how the data was generated or the quality of the data. For example, when analyzing data obtained from phone interviews or remote focus groups, it is difficult to assess whether a woman can speak freely or whether she is being monitored by the mother-in-law or husband.

In contrast, many advocates of data science argue that big data and AI are more reliable and “objective” than conventional sources of evaluation data because data science can “eliminate human error.” While AI can certainly fix computational errors, it is misleading to believe that AI is completely “objective” and that humans, with all of their cultural and other biases, are completely excluded.

Hot Tip

Always look for ways to incorporate big data into your evaluations BUT always question claims that studies based on big data are unbiased and more reliable than other forms of evaluation

Rad Resource

I would like to propose a checklist (developed in collaboration with my colleagues Jerusha Govender, Oscar Garcia, Pete York, Miriam Sarwana) of 4 common kinds of bias that may affect the objectivity and reliability of big data. The purpose is not to claim that data science is more biased than other forms of evaluation, but only to provide a framework for assessing claims that data science is less prone to bias.

Human bias. Humans make critical inputs at all stages of a big data evaluation, and many of these inputs are affected by deep-seated socio-cultural and psychological factors of which most people are not aware.
Who is at the table? If a certain group (e.g. single mothers, Hispanics, people with physical or mental challenges, ethnic or racial groups) is not represented when a study is being planned, it is very likely that their interest will not be fully taken into account.
Technological and methodological bias. The original forms of much big data may be of low value and require significant treatment before it is usable. So unless care is taken, much of this data may be subject to selection, confirmation, and measurement biases.
Organizational and political factors. There may be organizational pressures that can affect what evaluation questions are asked (or not asked), how the study is designed or the results presented.

The American Evaluation Association is hostingIntegrating Technology into Evaluation TIG Week with our colleagues in the Integrating Technology into Evaluation Topical Interest Group. The contributions all this week to AEA365 come from ITE TIG members. Do you have questions, concerns, kudos, or content to extend this AEA365 contribution? Please add them in the comments section for this post on the AEA365 webpage so that we may enrich our community of practice. Would you like to submit an AEA365 Tip? Please send a note of interest to AEA365@eval.org. AEA365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.

Tech TIG Week: Understanding Sources of Bias in Big Data by Michael Bamberger

Hot Tip

Rad Resource

Further Reading

Leave a Comment Cancel Reply