AEA365 | A Tip-a-Day by and for Evaluators

CAT | Quantitative Methods: Theory and Design

Hello! We are Silvana Bialosiewicz and Kelly Murphy from Claremont Graduate University. Working as Senior Research Associates at the Claremont Evaluation Center, we have often been tasked with analyzing large quantitative databases and drawing conclusions about program effectiveness based on our results. Today, we’d like to share some hot tips about pesky sources of bias that may be lurking in your data and provide some rad resources about how to uncover this bias.

Hot Tip: Conduct tests of Measurement Invariance on your evaluation surveys

We all know that the accuracy of the self-report measures we use to assess multi-dimensional constructs (e.g., self-esteem, organizational commitment) in our evaluations is contingent upon the reliability and validity of our measures, but did you know that these measurement properties are not always generalizable to the different populations and program contexts we find in our evaluations? For example, have you ever wondered…

  • If program participants from different cultures or socioeconomic backgrounds are interpreting your survey questions in the same way?
  • If a participant’s gender, age, or literacy level affects the way they respond to your survey?
  • If participating in the program changes the way participants think about your survey questions?

Answering questions such as these in a statistically rigorous manner helps us ensure that the comparisons we make (either across time or across groups) represent true differences in our constructs of interest!

Measurement Invariance:

What is it?

Measurement Invariance is the statistical property of a measurement that indicates that the same underlying construct is being measured across groups or across time.

How do we know if we have it?

When the relationship between manifest indicator variables (scale items, subscales, etc.) and the underlying construct are the same across groups or across time.

How do we test for it?

Because measurement invariance is too dense to cover in a single blog post, we have put together some rad resources to help you learn more about measurement invariance and the steps to assess measurement invariance.

Rad Resource #1: Based on our AEA13 demonstration session, we have put together an extensive resource packet for practitioners who are interested in learning more about measurement invariance and how to test for it.

Rad Resource #2: If you’re interested in learning more about the software used to assess measurement invariance, here is a link to a discussion thread on the strengths and weaknesses of available software.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

 

·

Hello! I’m Kathy McKnight, Principal Director of Research, Center for Educator Effectiveness at Pearson.

Today I completed my annual 2-day introductory workshop on Quantitative Methods, which I’ve offered at AEA’s annual conference every year since….well, I’ve lost track. Over the years, I’ve observed a lot of evaluators who participate in my workshop, hungry to learn something about statistics and quantitative methods.

Lessons Learned: A few observations to share: 1) It’s difficult for program evaluators to find quality workshops/educational opportunities for continuing their education in quantitative methods; I find this is the case for those at an introductory, intermediate, and advanced level, unless you’re located within a university (and even then, it’s not guaranteed you can find what you need). 2) I’m further convinced each year that training in statistics is not enough — evaluators need training in measurement and research methods/evaluation design as well. Without each of those critical elements, knowledge of any one of them alone is not sufficient. I’ve noticed that the greatest engagement in my workshop tends to be around methodological/philosophy of science issues with respect to how program evaluations are carried out, and what we can learn from them. Studying statistics helps bring out these issues: it’s not only about what tools are available, but how we can best use them, given our evaluation goals. These issues are what attracted me to program evaluation and keep me interested in this work. It seems to be the case for many others.

Hot Tips: For those interested in furthering their knowledge and skills in quantitative methods, AEA has a Quantitative TIG, and the good news is, we don’t bite! It’s a supportive, engaged group of individuals who share a strong interest in the methods by which we conduct evaluations, how we measure constructs we care about, and how we model relationships between those variables quantitatively. New members could help us identify ways to provide more and better training to our membership, and share resources. Additionally, AEA offers e-Studies (I offered one this past spring on basic inferential statistics) and “coffee break webinars” (brief presentations of a specific topic — I offered one on descriptive statistics). These are just a few of the online resources available to our membership*. The annual meeting also offers 1-day, 3-hour and 90-minute workshops, and a host of presentations focused on quantitative methods. These are well worth checking out as part of your continued education in the broad area of quantitative methods.

Rad Resource: Don’t forget your friend the internet — there are countless YouTube videos and statistics, measurement, and research methods websites that provide tutorials as well as a multitude of resources.

I wish you all a productive, educational conference this year in Washington DC! Please do check out the presentations from the Quantitative TIG.

*Coffee break webinars, e-Study workshops, and Professional Development workshops at the conference are paid content.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

·

Greetings fellow evaluators! My name is Johanna Morariu and I’m a Director of Innovation Network, a nonprofit consulting firm that builds the evaluation capacity of nonprofit organizations and foundations. I’m also the co-chair of the Data Visualization & Reporting (DVR) Topical Interest Group (TIG) along with Amy Germuth, Stuart Henderson, and David Shellard and the DVR TIG is hosting all this week on aea365.

Hot Tip: Have you heard about treemaps? Treemaps are a relatively new data visualization technique—especially to evaluators. The technique was created in the 1990s by Dr. Ben Shneiderman for mapping computer hard drive usage.

Treemaps are useful for visualizing hierarchical data, or tree structure data. Here’s an example: there are 100 program participants. Of those participants, 55 are female and 45 are male. Using traditional dataviz techniques, the data looks like this:

PictureA

Program participant data may contain additional categories of information, such as age. Keeping with traditional dataviz, the data might look like this:

PictureB

At this point we have two levels of data. 1) Gender: is the program participant female or male? 2) Age: is the program participant between the ages of 14 – 17 or 18 – 21?

What if we want to add a third level of data to our visualization about attendance? Let’s try a treemap, which is designed for hierarchical data:

PictureC

Area is used to proportionally illustrate differences in values, i.e., number of participants. The larger a rectangle, the more program participants it represents. Nested rectangles reflect the three levels of data—gender, age, and attendance—so that each level can be analyzed.

For example, the proportion of females to males can be ascertained by looking at the female and male rectangles (the outermost rectangles). Also, within male and female, the proportion of 14 – 17 year olds compared to 18 – 21 year olds can be estimated (first level of nested rectangles). And within those rectangles, the proportion of females and males in each age group who partially attended or completed the program is also represented. In this treemap, color is used to underscore largest vs. smallest values, like a heatmap.

Rad Resources: Wondering how to get started making your own treemaps?

Lesson Learned: There are more dataviz options than the usual charts and graphs we’re used to! Varying visualization types is akin to paying attention to varied word choice to keep a reader’s interest. It’s not necessary, but it sure can help!

So, how can you use treemaps? And are there other treemap tools you’d recommend to the AEA community?

aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. AEA is celebrating Data Visualization and Reporting Week. The contributions all this week to aea365 come from members of AEA’s Data Visualization and Reporting Topical Interest Group. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice.

I’m Paul Bakker, the founder and lead consultant of Social Impact Squared. I help agencies with a social purpose understand, measure, communicate, and improve their outcomes. One of the services I provide is data analysis, so I deal with statistical significance quite a lot.

Hot Tip: Abandon the 95% rule. In statistics classes, they teach you to reject the null hypothesis (of no difference) if there is less than a 5% chance that the hypothesis is true. It makes sense to provide students working on text book examples with a rule of thumb, but people don’t use such a rule when making real-life decisions. Before you analyze your data, discuss with your clients and the relevant decision makers the level of confidence they need to make a decision. Maybe they want to be 95% confident, or maybe being 80% confident is good enough for them to act.

Hot Tip: Consider the need to adjust for increases in error due to multiple tests. Often, you need to run multiple tests to analyze your data. The chance that at least one test will conclude that there is a difference when there isn’t is equal to:

spelled-out error equation

 

For instance, if you ran 20 tests at the 95% significance level, then the chance that at least one of those tests provides you with a wrong answer is:

example error rate

 

The typical advice is to increase the significance level of each test. However, consider the following possible scenario. Out of those 20 tests, 6 tests are significant at the 95% level, but only one is significant at the 99% level. What is more important to your client? Acting on one incorrect difference or not acting on four real differences?

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

I am Susan Kistler, the American Evaluation Association’s Executive Director and aea365’s regular Saturday contributor. Last week, at the Minnesota Evaluation Studies Institute, we had a discussion about when to use derived variables.

Lessons Learned – What is a derived variable? Derived variables are variables that are computed based on other variables.

AEAmem2We began by looking at the distribution of AEA members residing in the United States and quickly found that California dominated the states in terms of members residing there. But this didn’t really tell the whole story. The top ten states by total number of members residing in them is shown at right. Having over 700 members in California would be judged differently from having over 700 members in Rhode Island given each state’s population.

It was time to add population to the mix.

First, we divided the total members in the state  by the state’s population. This gave us us numbers like 0.0000391204283265605 which, while accurate, were difficult to readily read and interpret. Multiplying by one million resulted in a derived variable representing the number of members per million state residents.

PerMil2The next graph at right is based on the derived variable, members per million population. California, once so dominant, doesn’t even make this new top ten list and states with smaller populations, such as Alaska, Vermont, and Hawaii, are now reflected more in the mix.

The derived variable in this case helps to make more direct state-to-state comparisons, taking into account population.

Want to check out total membership and member density in your state? Click on the graph below to go to the full interactive option showing every state.

AllinOneDo you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

Hello! We are Johanna Morariu, Kat Athanasiades, and Ann Emery from Innovation Network. For 20 years, Innovation Network has helped nonprofits and foundations evaluate and learn from their work.

In 2010, Innovation Network set out to answer a question that was previously unaddressed in the evaluation field—what is the state of nonprofit evaluation practice and capacity?—and initiated the first iteration of the State of Evaluation project. In 2012 we launched the second installment of the State of Evaluation project. A total of 546 representatives of 501(c)3 nonprofit organizations nationwide responded to our 2012 survey.

Lessons Learned–So what’s the state of evaluation among nonprofits? Here are the top ten highlights from our research:

1. 90% of nonprofits evaluated some part of their work in the past year. However, only 28% of nonprofits exhibit what we feel are promising capacities and behaviors to meaningfully engage in evaluation.

2. The use of qualitative practices (e.g. case studies, focus groups, and interviews—used by fewer than 50% of organizations) has increased, though quantitative practices (e.g. compiling statistics, feedback forms, and internal tracking forms—used by more than 50% of organizations) still reign supreme.

3. 18% of nonprofits had a full-time employee dedicated to evaluation.

Morariu graphic 1

4. Organizations were positive about working with external evaluators: 69% rated the experience as excellent or good.

5. 100% of organizations that engaged in evaluation used their findings.

Morariu graphic 2

6. Large and small organizations faced different barriers to evaluation: 28% of large organizations named “funders asking you to report on the wrong data” as a barrier, compared to 12% overall.

7. 82% of nonprofits believe that discussing evaluation results with funders is useful.

8. 10% of nonprofits felt that you don’t need evaluation to know that your organization’s approach is working.

9. Evaluation is a low priority among nonprofits: it was ranked second to last in a list of 10 priorities, only coming ahead of research.

10. Among both funders and nonprofits, the primary audience of evaluation results is internal: for nonprofits, it is the CEO/ED/management, and for funders, it is the Board of Directors.

Rad Resource—The State of Evaluation 2010 and 2012 reports are available online at for your reading pleasure.

Rad Resource—What are evaluators saying about the State of Evaluation 2012 data? Look no further! You can see examples here by Matt Forti and Tom Kelly.

Rad Resource—Measuring evaluation in the social sector: Check out the Center for Effective Philanthropy’s 2012 Room for Improvement and New Philanthropy Capital’s 2012 Making an Impact.

Hot Tip—Want to discuss the State of Evaluation? Leave a comment below, or tweet us (@InnoNet_Eval) using #SOE2012!

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· · · · ·

Welcome to the Evaluation 2013 Conference Local Arrangements Working Group (LAWG) week on aea365. I am Joseph Gasper, and I am a Senior Study Director with Westat, an employee-owned research corporation in Rockville, Maryland.

Westat was hired by New York City’s Center for Economic Opportunity (CEO) to evaluate approximately 35 programs to move economically disadvantaged New Yorkers out of poverty. Several of the evaluations used propensity score matching (PSM) to estimate program impacts, a strategy that is increasingly used when random assignment is impractical or unethical.

While the benefits of PSM have been widely touted, little discussion has been given to the challenges involved in a thoughtful implementation of this method in “real world” evaluation. This blog discusses a few of these challenges based on my experiences.

Lessons Learned:

  • PSM not a substitute for a carefully selected comparison group. It is necessary to identify a comparison group of individuals who are eligible but did not participate and are similar enough to participants to find matches. PSM can be used to further refine the comparison group so that it is similar to the treatment group on as many relevant characteristics as possible. If the comparison group is poorly chosen, there will be few good matches for participants.
  • Program data are often insufficient for PSM. Program data do not usually include information on attitudes and motivations that lead individuals to self-select (or be selected) into programs. If a program serves those most in need, too few variables on which to match may lead to a program group that has worse outcomes than the comparison group because they “worse off” to begin with. Such a finding could lead to the erroneous conclusion that the program is harmful!

Hot Tip (for the more technically-oriented):

  • Choose a matching methodology that will allow for subgroup analysis. Program staff often want to know whether their programs are effective for subgroups of participants. Matching must be done separately for each subgroup to ensure that members of each matched pair are the same on the subgroup variable, which can be time consuming. A more efficient approach is to force matches on the subgroup variables (exact matching) and then match on the propensity score.

Rad Resource: The Mayo Clinic maintains several SAS macros to perform PSM. The GREEDY macro can perform exact matching and then match on the propensity score: http://mayoresearch.mayo.edu/mayo/research/biostat/sasmacros.cfm

Hot Tip—Insider’s advice for Evaluation 2013 in DC: If you’re in the mood for “American style” Chinese fare close to the conference, you should check out Meiwah Restaurant. This is also a great place to spot politicians, members of the media, and other beltway celebrities.

 

We’re thinking forward to October and the Evaluation 2013 annual conference all this week with our colleagues in the Local Arrangements Working Group (LAWG). AEA is accepting proposals to present at Evaluation 2013 through until March 15 via the conference website. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice.

No tags

Hi, fellow evaluators! I’m Jess Chandler, a consultant at Energy Market Innovations, a small firm primarily focused on energy efficiency and renewable energy program evaluation.

Rad Resource – jessachandler.com: This blog is a combination of several voices of me, from personal concerns to evaluation methods. The content here has adapted over time and has only recently been more focused on evaluation. I’d also invite folks to check out The Oil Drum and The Energy Collective, two great energy focused blogs that sometimes reflect upon evaluation.

 

Hot Tips – favorite posts: I find that I am more enamored with posts of other evaluators than my own, but here are a few posts that I like:

  • 10/4/2012 – More better Better Evaluation: This is a quick post about the revamped Better Evaluation site. The folks at Better Evaluation are doing great work to collect and distribute evaluation ideas. I also couldn’t resist the opportunity to encourage more front-end work on evaluation.
  • 9/3/2011—A more meaningful regression analysis: This post also redirects readers to another blog, but I love sharing resources like this that can help people. Methods are so necessary!
  • 2/3/2010 – Energy Efficiency Potential: This post considers the value of energy efficiency potential studies, which are critically important, but can be completed prematurely in the planning cycle—I propose a two stage process here. Outside of efficiency, other planning or forecasting exercises may face similar constraints.

Lessons Learned – why I blog: I’m certainly not an example of what I see as an ideal blogger (yet) – someone who consistently posts new and exciting content. I sometimes tweet about something and never remember to update my blog.  I blog because:

  • I love to share great resources with colleagues and friends
  • The outlet is an ideal place to test out ideas and let them gel
  • It provides a record of thoughts over time, and is helpful for reflection on changing ideas

Lessons Learned – what I’ve learned: The big picture: Blogging is real work. However, I have learned that I get about as much out of blogging as I put in. I love to hear from people who are interested in my tweets and blog posts.

This winter, we’re continuing our series highlighting evaluators who blog. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

Hi, I’m Stacy Johnson, Senior Research Analyst at the Improve Group a national and international evaluation consulting organization based in Saint Paul, MN. The Improve Group works with organizations to make the most of information, navigate complexity and ensure their investments of time and money lead to meaningful, sustained impact. This past October at Evaluation 2012, I had the opportunity to present on my experiences analyzing longitudinal data with less than perfect data.

Being thoughtful in the planning and data collection process plays a crucial role in successfully collecting the data you need to address your evaluation questions. But, what happens if you are not involved in this process? The first step is to assess what you have to work with and figure out how to move forward. Sometimes you may be pleasantly surprised at what a great job was done in gathering data over time. Other times you need to figure out how to make the best out of what you have and focus on the control you have over what happens from this point forward. This aea365 includes some of the lessons I have learned and tips that helped me along the way.

Lesson Learned: Changes are often made to data collection tools over time including changes to wording of items, changes to response options, and eliminating and adding items.

Hot Tips:

  • Recommend selecting key variables to track and keep consistent
  • Explain implications of making changes
  • Facilitate thinking ahead
  • Create a database to use as a guide for what should be tracked

Lesson Learned: The process of collecting data can be unclear with data collected by a variety of people.

Hot Tips:

  • Create a formal process with detailed instructions and protocols
  • Facilitate clear communication and training of data collectors

Lesson Learned: Messy datasets! Data is collected in different formats, is poorly cleaned, incorrectly merged, there are no clear data cleaning decisions, and there are unknown variable names, labels, and coding.

Hot Tips:

  • Request original data
  • Talk to those involved about the decisions that were made
  • Get copies of instruments

Lesson Learned: There is a need to manage unrealistic expectations when you are asked questions the data cannot answer.

Hot Tips:

  • Discuss what can be shown (and what cannot) with the data collected
  • Balance expectations with reality – you may need to guide others on how feasible their requests are and the limitations of the data
  • Facilitate thinking ahead (again) – help others think about their future evaluation needs and how their work may evolve over time

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Hello!  My name is Alex (Chi-Keung) Chan and I am currently a Senior Lecturer at the Hong Kong Shue Yan University and a partner of A Data-Driven Consulting Group, LLP based in Minnesota.  Prior to that, I was a Senior Evaluation and Research Fellow at the Minneapolis Public Schools.  I have been adopting the value-added evaluation approach to investigate the effectiveness of various social and educational programs or to identify the beat-the-odds teachers or schools.  Unfortunately, stakeholders always emphasize the summative interpretation of the value-added results without fully understanding the potential formative uses of these measures. So how can we add value to value-added to make it more meaningful and useful?

Hot Tips:

1. Choose value-added measures that are sensitive to the implementation process (e.g. improving teaching) but not just for the accountability outcomes (e.g. determining performance pay).  In other words, a value-added measure is meaningless if it cannot tell whether what a teacher does is really adding value to the learning of student.

2. Link the value-added findings to implementation (e.g. link the value-added scores with the instructional observation scores) to identify best practices that we can learn from (what works) and practices that need to be improved (what doesn’t work).

3. Gather additional evidence to verify the value-added X implementation interaction results (e.g. interview the principals, coaches, and teachers who are part of the professional development program).  Thus, we want to find out the details of what works and what does not work beyond the numbers.

Lessons Learned:    

1.  Collaborate and select value-added measures with the stakeholders.  Sometimes, the opinions of stakeholders could be either too strong or too loose at this initial but critical stage.

2.  Communicate and emphasize the goals and benefits of the formative uses of value-added measures to stakeholders. This is a very challenging process because our focus is too often driven by the limited goal of “accountability” by sacrificing the ultimate goal of “improvement”.

3.  Encourage and understand the different perspectives of various stakeholders in interpreting the findings.  We often miss some voices or ignore some perspectives due to the uneven power structure in a system.  We need to ensure that there is a process in place to hear the voices of powerless people who usually are impacted the most by a program.

Rad Resources:

  1. Value-Added Research Center (VARC) at University of Wisconsin: (Click on the Tutorials tab)
  1. Value-added Measures in Education by Professor Douglas Harris: (An easy-to-read book for the layperson to understand the concepts and applications of value-added)

Hot Tip: Minneapolis/Saint Paul Activities:

My family loves to have a walk at the Sculpture Garden when we were living in Minnesota, and we love the Quang Vietnamese Restaurant at Nicollet Mall.

The American Evaluation Association is celebrating Minnesota Evaluation Association (MN EA) Affiliate Week with our colleagues in the MNEA AEA Affiliate. The contributions all this week to aea365 come from our MNEA members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

<< Latest posts

Older posts >>

Archives

To top