SNA TIG Week: Todd Honeycutt on Missing Data in Social Network Analysis
Hello, my name is Todd Honeycutt and I’m a researcher at Mathematica Policy Research. We’ve used social network analysis (SNA) in several evaluations, and one challenge we’ve encountered is missing data. Even with high response rates on surveys, you can still have missing data from survey and item non-response. You can also have missing data by incorrectly defining network boundaries and membership (see this week’s AEA365 tips by Stacey Friedman and Russell Cole about this issue).
When we have missing data, we are making inferences from a partial network. Such results can be misleading, particularly if the data are not missing randomly.
Network data is about relationships—both to and from network members—and for each nonrespondent, there are many relationships about which you have no data. Consider a network with 10 organizations. If we have data about their communication with each other from all 10, then we would have data about 90 relationships [10 x (10-1)]. If only 8 of those 10 organizations responded to a survey, then we would have complete data about 56 relationships [8 x (8-1)] for measures that require a complete network (information both from and about each member), or 62 percent (56/90) of the network’s possible relationships. However, we also have partial data (about nonrespondents from respondents, but not vice versa), and so have data about 72 relationships (8 x [10-1]), or 80 percent of the network. These data can be used with measures that don’t require a complete network.
Hot Tip #1: The rule of thumb is to have data from 70 to 80 percent of your network members. However, when you have lower response rates, you should consider measures—such as indegree centralization—that are robust when data that are missing at random, and can be calculated for all network members, even nonrespondents.
Hot Tip #2: Consider using blockmodeling techniques for your analysis. You can include all network members, even nonrespondents, by using pre-specified conditions for members with missing data. [For more details, see Doreian, P., Batagelj, V., & Ferligoj, A. (2005.) Generalized Blockmodeling. Cambridge University Press, New York.]
Rad Resources: The following papers provide information on missing data in social networks:
- Costenbader, E. and Valente, T.W. (2003). “The stability of centrality measures when networks are sampled.” Social Networks, 25, 283-307.
- Huisman, Mark. “Imputation of missing network data: Some simple procedures.” Journal of Social Structure, 10(1), 1-29.
- Kossinets, G. (2006). “Effects of missing data in social networks.” Social Networks, 28, 247-268.
The American Evaluation Association is celebrating SNA TIG Week with our colleagues in the Social Network Analysis AEA Topical Interest Group. The contributions all this week to aea365 come from our SNA TIG members and you can learn more about their work via the SNA TIG sessions at AEA’s annual conference. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.