AEA365 | A Tip-a-Day by and for Evaluators

CAT | Cluster, Multi-site and Multi-level Evaluation

Hello, I am Edith Gozali-Lee, a research scientist at Wilder Research. I work primarily on research and evaluation projects related to education. I am currently working on a multi-site, longitudinal study of an early childhood initiative. The study includes three cohorts of school-based preschool program children in ten schools, five cohorts of community-based child care children in homes and centers, and comparison children with and without prior preschool experience. The study follows children from preschool to third grade. That’s a lot to track, making good data collection critical from the start.

Hot Tips:

These are a few coding tips that will help to ensure good data collection tracking:

  • Anticipate the different groups ahead of time and make intuitive coding to make it easier for the following years’ data tracking and analyses
  • Use categories or codes used by schools to make data analyses process easier when you merge data that you collect with other student data collected by schools (demographic data and student outcomes)
  • Label all instruments (survey and assessment forms) with these codes prior to data collection to reduce coding work after the data collection and errors for data entry

Lesson Learned:

It is helpful to hold regular project debriefs to reflect on what works well and does not work so well. This will make the evaluation process go smoother and faster the next time around.

Rad Resources:

Practical research-based information, visit CYFERnet Children, Youth and Families Education and Research Network

Resources for research in early childhood:

We are looking forward to seeing you in Minnesota at the AEA conference this October. Results of this study (along with other Wilder Research projects and studies) will be presented during a poster session: Academic Outcomes of Children Participating in Project Early Kindergarten Longitudinal Study.

The American Evaluation Association is celebrating with our colleagues from Wilder Research this week. Wilder is a leading research and evaluation firm based in St. Paul, MN, a twin city for AEA’s Annual Conference, Evaluation 2012. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.


My name is Gary Resnick and I am the Director of Research at Harder+Company Community Research, a California-based consulting firm. My background combines program evaluation with child development research, and I have an interest in system theory and networks.

Harder+Company has been involved evaluating First 5 programs in a number of California counties. First 5 arose from 1998 Proposition 10, adding a tax on tobacco products with funds distributed to counties to fund local programs that improve services for children birth to 5 and their families. An important goal of First 5 funding is to act as a catalyst for change in each county’s systems of care. To measure system change, we focused on inter-agency coordination and collaboration. Increases in coordination and collaboration would indicate that agencies are better able to share resources and clients, reduce redundancies and service gaps, and increase efficiency.

Rad Resource: The Levels of Collaboration Scale assesses collaboration, has excellent psychometric properties and can be administered in web-based surveys to agency respondents. To see it in action, check out this article in the American Journal of Evaluation. Originally a 5-point Likert scale, we combined the two highest scale points creating a 4-point scale to make it easier for respondents.

Hot Tip: Start by defining the network member agencies using objective, clear, and unbiased criteria. Later, you can expand the network by asking respondents to nominate up to three additional agencies with whom they interact.

Hot Tip: Select at least two respondents from each organization, three is better, from different levels of the organization, administrators and managers as well as direct line staff.

Lesson Learned: It is important to have complete, reciprocal ratings for each agency (even if not from all respondents). If you have too much missing data at the agency level, consider excluding the agency from the network.

Hot Tip: Use Netdraw, a Windows freeware program, to produce two-dimensional network maps from agency-level Collaboration Scale ratings. See our maps here. The maps identify agencies most involved with other agencies at the center of the map (key players) and those least involved, at the periphery of the network. Add attributes of agencies (e.g. geographic region served) to map subgroups of your network.

Hot Tip: Produce two sets of maps, one with no agency labels for public reporting, and another with agency labels, for internal discussions with clients and agencies. Convene a meeting with the agency respondents and show them the maps with agency labels, to help them understand where they stand in the network and to foster collaboration.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· · · · ·

My name is Leland Lockhart, and I am a graduate student at the University of Texas at Austin and a research assistant at ACT, Inc.’s National Center for Educational Achievement (NCEA).  The NCEA is a department of ACT, Inc., a not-for-profit organization committed to helping people achieve education and workplace success. NCEA builds the capacity of educators and leaders to create educational systems of excellence for all students. We accomplish this by providing research-based solutions and expertise in higher performing schools, school improvement, and best practice research that lead to increased levels of college and career readiness.

In applied research, unfamiliarity with advanced procedures often leads researchers to conduct inappropriate assessments.  More specifically, unfamiliarity with the cross-classified family of random effects models frequently causes researchers to avoid this approach in favor less complicated methods.  The results are frequently biased, leading to incorrect statistical inferences.  This has direct implications for the field of program evaluation, as inaccurate conclusions can spell doom for both a program and an evaluator.

Hot Tip: Use cross-classified random effects models (CCREMs) when lower-level units are identified by some combination of higher-level factors.  For example, students are nested within neighborhoods, but neighborhoods often feed students into multiple high schools.  In this scenario, because neighborhoods are not perfectly nested within high schools, students are cross-classified by neighborhood and high school designations.  Use the following steps to diagnose and model cross-classified structures:

1)  Examine the data structure. Is a lower-level unit nested within higher-level units?  If so, what is the relationship between the higher-level units?  If they may not be perfectly hierarchically related, use a cross-classified random effects model.

2)  Include the appropriate classifications. Many applied researchers simply avoid cross-classified analyses by ignoring one of the cross-classified factors.  This severely limits the generalizability of your results and drastically alters statistical inferences.

3)  Provide parameter interpretations. Properly specified CCREMs are analogous to regression analyses.  Interpret the parameters in the same fashion, being sure to provide non-technical interpretations for lay audiences.

4)  Have software do the heavy lifting. Fitting CCREMs is incredibly easy in a variety of statistical packages.  HLM6 provides a user-friendly point-and-click interface, while SAS provides more flexibility for the programming savvy.

5)  Use previously applied CCREMs. Peer reviewed methodological journals are rife with exemplar CCREMs and the procedures used to estimate them.  When in doubt, follow the steps outlined in the methods section of a relevant journal article.

Rad Resource: Beretvas, S. N. (2008). Cross-classified random effects models. In A. A. O’Connell & D. B. McCoach (Eds.), Multilevel modeling of educational data (pp. 161-198). Charlotte, NC: Information Age Publishing.  This chapter provides an excellent introduction to CCREMs for those familiar with multiple regression analyses.

This contribution is from the aea365 Tip-a-Day Alerts, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to


My name is Monica Hargraves and I work with Cooperative Extension associations across New York State as part of an evaluation capacity building effort in the Cornell Office for Research on Evaluation (CORE).  My work with Extension is shaped, in part, by insights we gained through a Concept Mapping research project we did in late 2008.  We wanted to explore, from practitioners’ perspectives, what factors contribute to supporting evaluation practice in an organization.

We used Concept Mapping software from Concept Systems, Inc. to gather ideas in response to this prompt: “One specific thing an Extension organization can do to support the practice of evaluation is …” Contributors included county-based educators and Executive Directors, as well as state-level Extension administrators and Cornell staff.  The raw ideas were pared down to a working set of 80, and then participants sorted the ideas into clusters and rated them on two criteria: potential for making a difference, and relative difficulty.

The individual ideas become points on a “Cluster Map” that gives a visual representation of how participants conceptualized the patterns and themes in ideas (see below. For information on the Concept Systems technology and the statistical techniques that underlie it, see  The ratings are useful for thinking strategically about what to do give priority to when trying to improve and sustain evaluation practice in organizations.

Rad Resource: For more detail on the study, including a handout with the individual idea statements and their ratings on potential difference, see

Cluster Map of Ideas in Response to the Prompt: One specific thing an Extension organization can do to support the practice of evaluation is …

Lessons Learned:

  • Technical assistance and training are not enough! The top-rated cluster in terms of potential for making a difference was “Communicate the Value of Evaluation.”  The ideas there included educating organization leaders, staff, and volunteers on the importance of evaluation (not the how-to), using evaluation results well and demonstrating how they lead to better programming, having an evaluation champion in-house, making evaluation results easy to understand and user-friendly.
  • Communication is important. Communication should be used to motivate evaluation and build organizational commitment to it, and as a practical tool for sharing what works, fostering collaborations, and saving time.
  • Leadership and Structure matter. The second and third most important clusters were “Set Expectations and Requirements” and “Integrate into Organization Structure”.  Respondents wanted clarity and consistency, and to have evaluation woven into a wide range of organization functions and practices.

This contribution is from the aea365 Daily Tips blog, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to


My name is Mika Yoder Yamashita. I am the qualitative evaluation lead for the Center for Educational Policy and Practice at Academy for Educational Development. Our Center has been conducting process and outcome evaluations of the federally funded program, Gaining Early Awareness and Readiness for Undergraduate Programs (GEAR UP).  This program aims at increasing college access among disadvantaged students.  As we are evaluating programs implemented in several sites, we are beginning to explore the possibility of conducting a multi-site evaluation. Today I will share my Center’s thoughts on how we can effectively approach conducting a multi-site evaluation that uses qualitative data to understand the process of program implementation. Then I will share how we use the literature to guide our data collection and analysis.

Our evaluation utilizes a similar approach to cluster evaluation (W.K. Kellogg Foundation, 1998). We draw upon Davidson’s (2000) approach to build hypotheses and theories of which strategies seem to work in different contexts.  The end goal of our cluster evaluation is to provide the client with a refined understanding of how programs are implemented at the different sites.

Cluster evaluation presents us with the following challenge: How to effectively collect and analyze qualitative data in a limited time to generate information on program implementation. To help us to guide qualitative data collection and analysis, we draw on a literature review.

Hot Tip: Start with literature review to create statements of what is known about how a program works and why it works. Bound a literature review according to the availability of time and evaluation questions. Document keywords, search engines, and decision regarding which articles are reviewed in order to create a search path for others. Create literature review protocols that consist of specific questions.  The reviewers write answers as they review each article. The evaluation team members review two to three summaries together to refine literature review questions and the degree of description to be included. We use qualitative data analysis software for easy management and retrieval of literature summaries. With this information, we draw diagrams to help us articulate what the literature reveals about how a program works and in what context. Using diagrams helps to share ideas across evaluation team members who are not involved in literature review.  Finally, create a statement of how and why the program works in what context and compare these statements with the data from the multiple sites.

Resources: Davidson, E.J., (2000). Ascertaining causality in theory-based evaluation. New Directions for Evaluation, 87, 17-26.*

W. K. Kellogg Foundation (1998). W.K. Kellogg Foundation Evaluation Handbook. Battle Creek, Michigan: Author. Retrieved from:

*AEA members have free online access to all back content from New Direction for Evaluation. Log on to the AEA website and navigate to the journals to access this or other archived articles.

This aea365 contribution is part of College Access Programs week sponsored by AEA’s College Access Programs Topical Interest Group. Be sure to subscribe to AEA’s Headlines and Resources weekly update in order to tap into great CAP resources! And, if you want to learn more from Mika, check out the CAP Sponsored Sessions on the program for Evaluation 2010, November 10-13 in San Antonio.



To top