AEA365 | A Tip-a-Day by and for Evaluators

CAT | Quantitative Methods: Theory and Design

I am Susan Kistler, the American Evaluation Association’s Executive Director and aea365’s regular Saturday contributor. Last week, at the Minnesota Evaluation Studies Institute, we had a discussion about when to use derived variables.

Lessons Learned – What is a derived variable? Derived variables are variables that are computed based on other variables.

AEAmem2We began by looking at the distribution of AEA members residing in the United States and quickly found that California dominated the states in terms of members residing there. But this didn’t really tell the whole story. The top ten states by total number of members residing in them is shown at right. Having over 700 members in California would be judged differently from having over 700 members in Rhode Island given each state’s population.

It was time to add population to the mix.

First, we divided the total members in the state  by the state’s population. This gave us us numbers like 0.0000391204283265605 which, while accurate, were difficult to readily read and interpret. Multiplying by one million resulted in a derived variable representing the number of members per million state residents.

PerMil2The next graph at right is based on the derived variable, members per million population. California, once so dominant, doesn’t even make this new top ten list and states with smaller populations, such as Alaska, Vermont, and Hawaii, are now reflected more in the mix.

The derived variable in this case helps to make more direct state-to-state comparisons, taking into account population.

Want to check out total membership and member density in your state? Click on the graph below to go to the full interactive option showing every state.

AllinOneDo you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

Hello! We are Johanna Morariu, Kat Athanasiades, and Ann Emery from Innovation Network. For 20 years, Innovation Network has helped nonprofits and foundations evaluate and learn from their work.

In 2010, Innovation Network set out to answer a question that was previously unaddressed in the evaluation field—what is the state of nonprofit evaluation practice and capacity?—and initiated the first iteration of the State of Evaluation project. In 2012 we launched the second installment of the State of Evaluation project. A total of 546 representatives of 501(c)3 nonprofit organizations nationwide responded to our 2012 survey.

Lessons Learned–So what’s the state of evaluation among nonprofits? Here are the top ten highlights from our research:

1. 90% of nonprofits evaluated some part of their work in the past year. However, only 28% of nonprofits exhibit what we feel are promising capacities and behaviors to meaningfully engage in evaluation.

2. The use of qualitative practices (e.g. case studies, focus groups, and interviews—used by fewer than 50% of organizations) has increased, though quantitative practices (e.g. compiling statistics, feedback forms, and internal tracking forms—used by more than 50% of organizations) still reign supreme.

3. 18% of nonprofits had a full-time employee dedicated to evaluation.

Morariu graphic 1

4. Organizations were positive about working with external evaluators: 69% rated the experience as excellent or good.

5. 100% of organizations that engaged in evaluation used their findings.

Morariu graphic 2

6. Large and small organizations faced different barriers to evaluation: 28% of large organizations named “funders asking you to report on the wrong data” as a barrier, compared to 12% overall.

7. 82% of nonprofits believe that discussing evaluation results with funders is useful.

8. 10% of nonprofits felt that you don’t need evaluation to know that your organization’s approach is working.

9. Evaluation is a low priority among nonprofits: it was ranked second to last in a list of 10 priorities, only coming ahead of research.

10. Among both funders and nonprofits, the primary audience of evaluation results is internal: for nonprofits, it is the CEO/ED/management, and for funders, it is the Board of Directors.

Rad Resource—The State of Evaluation 2010 and 2012 reports are available online at for your reading pleasure.

Rad Resource—What are evaluators saying about the State of Evaluation 2012 data? Look no further! You can see examples here by Matt Forti and Tom Kelly.

Rad Resource—Measuring evaluation in the social sector: Check out the Center for Effective Philanthropy’s 2012 Room for Improvement and New Philanthropy Capital’s 2012 Making an Impact.

Hot Tip—Want to discuss the State of Evaluation? Leave a comment below, or tweet us (@InnoNet_Eval) using #SOE2012!

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· · · · ·

Welcome to the Evaluation 2013 Conference Local Arrangements Working Group (LAWG) week on aea365. I am Joseph Gasper, and I am a Senior Study Director with Westat, an employee-owned research corporation in Rockville, Maryland.

Westat was hired by New York City’s Center for Economic Opportunity (CEO) to evaluate approximately 35 programs to move economically disadvantaged New Yorkers out of poverty. Several of the evaluations used propensity score matching (PSM) to estimate program impacts, a strategy that is increasingly used when random assignment is impractical or unethical.

While the benefits of PSM have been widely touted, little discussion has been given to the challenges involved in a thoughtful implementation of this method in “real world” evaluation. This blog discusses a few of these challenges based on my experiences.

Lessons Learned:

  • PSM not a substitute for a carefully selected comparison group. It is necessary to identify a comparison group of individuals who are eligible but did not participate and are similar enough to participants to find matches. PSM can be used to further refine the comparison group so that it is similar to the treatment group on as many relevant characteristics as possible. If the comparison group is poorly chosen, there will be few good matches for participants.
  • Program data are often insufficient for PSM. Program data do not usually include information on attitudes and motivations that lead individuals to self-select (or be selected) into programs. If a program serves those most in need, too few variables on which to match may lead to a program group that has worse outcomes than the comparison group because they “worse off” to begin with. Such a finding could lead to the erroneous conclusion that the program is harmful!

Hot Tip (for the more technically-oriented):

  • Choose a matching methodology that will allow for subgroup analysis. Program staff often want to know whether their programs are effective for subgroups of participants. Matching must be done separately for each subgroup to ensure that members of each matched pair are the same on the subgroup variable, which can be time consuming. A more efficient approach is to force matches on the subgroup variables (exact matching) and then match on the propensity score.

Rad Resource: The Mayo Clinic maintains several SAS macros to perform PSM. The GREEDY macro can perform exact matching and then match on the propensity score: http://mayoresearch.mayo.edu/mayo/research/biostat/sasmacros.cfm

Hot Tip—Insider’s advice for Evaluation 2013 in DC: If you’re in the mood for “American style” Chinese fare close to the conference, you should check out Meiwah Restaurant. This is also a great place to spot politicians, members of the media, and other beltway celebrities.

 

We’re thinking forward to October and the Evaluation 2013 annual conference all this week with our colleagues in the Local Arrangements Working Group (LAWG). AEA is accepting proposals to present at Evaluation 2013 through until March 15 via the conference website. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice.

No tags

Hi, fellow evaluators! I’m Jess Chandler, a consultant at Energy Market Innovations, a small firm primarily focused on energy efficiency and renewable energy program evaluation.

Rad Resource – jessachandler.com: This blog is a combination of several voices of me, from personal concerns to evaluation methods. The content here has adapted over time and has only recently been more focused on evaluation. I’d also invite folks to check out The Oil Drum and The Energy Collective, two great energy focused blogs that sometimes reflect upon evaluation.

 

Hot Tips – favorite posts: I find that I am more enamored with posts of other evaluators than my own, but here are a few posts that I like:

  • 10/4/2012 – More better Better Evaluation: This is a quick post about the revamped Better Evaluation site. The folks at Better Evaluation are doing great work to collect and distribute evaluation ideas. I also couldn’t resist the opportunity to encourage more front-end work on evaluation.
  • 9/3/2011—A more meaningful regression analysis: This post also redirects readers to another blog, but I love sharing resources like this that can help people. Methods are so necessary!
  • 2/3/2010 – Energy Efficiency Potential: This post considers the value of energy efficiency potential studies, which are critically important, but can be completed prematurely in the planning cycle—I propose a two stage process here. Outside of efficiency, other planning or forecasting exercises may face similar constraints.

Lessons Learned – why I blog: I’m certainly not an example of what I see as an ideal blogger (yet) – someone who consistently posts new and exciting content. I sometimes tweet about something and never remember to update my blog.  I blog because:

  • I love to share great resources with colleagues and friends
  • The outlet is an ideal place to test out ideas and let them gel
  • It provides a record of thoughts over time, and is helpful for reflection on changing ideas

Lessons Learned – what I’ve learned: The big picture: Blogging is real work. However, I have learned that I get about as much out of blogging as I put in. I love to hear from people who are interested in my tweets and blog posts.

This winter, we’re continuing our series highlighting evaluators who blog. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

Hi, I’m Stacy Johnson, Senior Research Analyst at the Improve Group a national and international evaluation consulting organization based in Saint Paul, MN. The Improve Group works with organizations to make the most of information, navigate complexity and ensure their investments of time and money lead to meaningful, sustained impact. This past October at Evaluation 2012, I had the opportunity to present on my experiences analyzing longitudinal data with less than perfect data.

Being thoughtful in the planning and data collection process plays a crucial role in successfully collecting the data you need to address your evaluation questions. But, what happens if you are not involved in this process? The first step is to assess what you have to work with and figure out how to move forward. Sometimes you may be pleasantly surprised at what a great job was done in gathering data over time. Other times you need to figure out how to make the best out of what you have and focus on the control you have over what happens from this point forward. This aea365 includes some of the lessons I have learned and tips that helped me along the way.

Lesson Learned: Changes are often made to data collection tools over time including changes to wording of items, changes to response options, and eliminating and adding items.

Hot Tips:

  • Recommend selecting key variables to track and keep consistent
  • Explain implications of making changes
  • Facilitate thinking ahead
  • Create a database to use as a guide for what should be tracked

Lesson Learned: The process of collecting data can be unclear with data collected by a variety of people.

Hot Tips:

  • Create a formal process with detailed instructions and protocols
  • Facilitate clear communication and training of data collectors

Lesson Learned: Messy datasets! Data is collected in different formats, is poorly cleaned, incorrectly merged, there are no clear data cleaning decisions, and there are unknown variable names, labels, and coding.

Hot Tips:

  • Request original data
  • Talk to those involved about the decisions that were made
  • Get copies of instruments

Lesson Learned: There is a need to manage unrealistic expectations when you are asked questions the data cannot answer.

Hot Tips:

  • Discuss what can be shown (and what cannot) with the data collected
  • Balance expectations with reality – you may need to guide others on how feasible their requests are and the limitations of the data
  • Facilitate thinking ahead (again) – help others think about their future evaluation needs and how their work may evolve over time

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Hello!  My name is Alex (Chi-Keung) Chan and I am currently a Senior Lecturer at the Hong Kong Shue Yan University and a partner of A Data-Driven Consulting Group, LLP based in Minnesota.  Prior to that, I was a Senior Evaluation and Research Fellow at the Minneapolis Public Schools.  I have been adopting the value-added evaluation approach to investigate the effectiveness of various social and educational programs or to identify the beat-the-odds teachers or schools.  Unfortunately, stakeholders always emphasize the summative interpretation of the value-added results without fully understanding the potential formative uses of these measures. So how can we add value to value-added to make it more meaningful and useful?

Hot Tips:

1. Choose value-added measures that are sensitive to the implementation process (e.g. improving teaching) but not just for the accountability outcomes (e.g. determining performance pay).  In other words, a value-added measure is meaningless if it cannot tell whether what a teacher does is really adding value to the learning of student.

2. Link the value-added findings to implementation (e.g. link the value-added scores with the instructional observation scores) to identify best practices that we can learn from (what works) and practices that need to be improved (what doesn’t work).

3. Gather additional evidence to verify the value-added X implementation interaction results (e.g. interview the principals, coaches, and teachers who are part of the professional development program).  Thus, we want to find out the details of what works and what does not work beyond the numbers.

Lessons Learned:    

1.  Collaborate and select value-added measures with the stakeholders.  Sometimes, the opinions of stakeholders could be either too strong or too loose at this initial but critical stage.

2.  Communicate and emphasize the goals and benefits of the formative uses of value-added measures to stakeholders. This is a very challenging process because our focus is too often driven by the limited goal of “accountability” by sacrificing the ultimate goal of “improvement”.

3.  Encourage and understand the different perspectives of various stakeholders in interpreting the findings.  We often miss some voices or ignore some perspectives due to the uneven power structure in a system.  We need to ensure that there is a process in place to hear the voices of powerless people who usually are impacted the most by a program.

Rad Resources:

  1. Value-Added Research Center (VARC) at University of Wisconsin: (Click on the Tutorials tab)
  1. Value-added Measures in Education by Professor Douglas Harris: (An easy-to-read book for the layperson to understand the concepts and applications of value-added)

Hot Tip: Minneapolis/Saint Paul Activities:

My family loves to have a walk at the Sculpture Garden when we were living in Minnesota, and we love the Quang Vietnamese Restaurant at Nicollet Mall.

The American Evaluation Association is celebrating Minnesota Evaluation Association (MN EA) Affiliate Week with our colleagues in the MNEA AEA Affiliate. The contributions all this week to aea365 come from our MNEA members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

My name is Lindsay Demers. I am an evaluator and quantitative methodologist at TERC in Cambridge, MA. I have worked as the lead data analyst on several projects at TERC, Lesley University, Boston College and Brandeis University.

Hot Tip: You must calculate inter-rater reliability!  Calculating inter-rater reliability (IRR) is of the utmost importance when using scoring rubrics to gather data from classroom observations, student assessments…really any data source from which humans are doing the extrapolation. The purpose of calculating inter-rater reliability is manifold, but here is one really important reason: You want clean, accurate, and replicable data. If you have coders who have different understandings of how to apply a scoring rubric, you’re going to end up with error-laden, inaccurate estimates of your variables of interest, both of which can lead to false conclusions.

Hot Tip: IRR calculation should be an ongoing process. Of course, IRR should be calculated at the beginning of the coding process to be sure that coders are starting at an adequate level of agreement. (Here, the “beginning of the coding process” means after coders have undergone extensive training on the coding rubric and appear ready to begin coding independently.) However, IRR should be re-calculated intermittently throughout the coding process to ensure that coders are staying consistent and have not “drifted” from one another.

Hot Tip: Percent agreement does not count as a measure of inter-rater reliability. Neither does a standard correlation coefficient.  Percent agreement is insufficient because it does not take into account agreement due to chance. Coders can have a very high percentage of agreement and still have very low IRR when chance is taken into account. With regard to the correlation, this measure is insufficient because two things can change together without being equal. A high correlation does not necessarily indicate a high level of agreement among coders.

Rad Resource:

  • Handbook of Inter-rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Multiple Raters, 3rd Edition by Kilem L. Gwet (2012)

This contribution is from the aea365 Tip-a-Day Alerts, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org. Want to learn more from Lindsay? She’ll be presenting as part of the Evaluation 2012 Conference Program, October 24-27 in Minneapolis, MN.

Greetings! We are Manuel Voelkle from the Max Planck Institute for Human Development in Berlin (Germany) and Han Oud from Radboud University in Nijmegen (The Netherlands). We are both interested in longitudinal data analysis and are trying to advance statistical methods for the analysis of change, as well as to improve their use and communication.

Hot Tip: Longitudinal data analysis is special!

Consider the following model which postulates an effect of Z (e.g., an intervention) on an outcome variable Y:

This model can be interpreted in two different ways: Either in a descriptive way (i.e., it describes the relationship between two variables and can be used to predict Y from Z) or in a causal way (i.e., Z causes Y). Which of the two interpretations is correct is up to the evaluator to find out by means of clever research designs.

What about longitudinal data analysis? In the figure below, we replaced Z by time. This is what we do when we use multilevel models or latent growth curve models as illustrated by the equation to the right.

It is important to realize that this model always remains at the descriptive level. Time never causes anything!

If we are interested in causal mechanisms, we should aim at models like the one shown below, that allow us to study the relationship between variables as they evolve and unfold over time, without time being a predictor.

Hot Tip: Most real-world processes develop continuously over time!

Although extremely useful and popular in modern evaluation research, the latter class of models suffers from the problem that only the order, but not the actual time intervals are taken into account. This causes problems when time intervals are different. For example if a time interval of 1 week is used in one study, but a time interval of 2 weeks in another study, the parameter estimates will be different, but it is unclear whether this is because the actual process, or the time intervals, differed.

Hot Tip: Use continuous time models!

Put more generally, most phenomena in the real world develop continuously over time and should be modeled as such. For this purpose continuous time models have been developed. In short, instead of predicting X directly, we predict the derivative of X with respect to time. This provides us with parameter estimates that are independent of the discrete time intervals a researcher happens to have chosen, and allows us to compare parameter estimates across studies (or individuals) based on different time intervals.

Rad Resources:

This contribution is from the aea365 Tip-a-Day Alerts, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org. Want to learn more from Manuel and Han? They’ll be presenting as part of the Evaluation 2012 Conference Program, October 24-27 in Minneapolis, MN.

·

Hello aea365. My name is Carlen Erickson and I am a student learning the statistics program R! I want to share two resources I have found invaluable.

Rad Resource: R is a free statistical software package for conducting analyses of quantitative data. It is a free competitor to SAS or SPSS. R can be downloaded from its website here http://www.r-project.org/.

Hot Tip: Anthony Damico has posted 93 “twotorials” – two minute, primarily beginner, tutorials on different aspects of using R. Each is a short screencast with audio. You can find them here http://www.twotorials.com/

Hot Tip: R-Bloggers is a compilation of multiple blogs focusing on using R. It has something for everyone, from the beginner to the expert. They are online here http://www.r-bloggers.com/

 

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

Greetings, I am Cindy Weng, a bio-statistician II employed at Pediatrics Research Enterprise at Department of Pediatrics at the University of Utah. This post was written together with my colleagues Chris Barker, SWB project manager and Larry George, statistician at Problem Solving Tools.

I learned about this methodology through a project assigned by ASA Statistics Without Borders (SWB) in 2011. The goal of this project was to analyze under 5 years (U5) mortality of children before (“baseline”) and after (“endline”) humanitarian aid given at Afghan refugee Camps in Pakistan. Survival analysis was used to estimate the probability distribution of age at death from current status and admissible age-at-death data. Inadmissible ages at death placed the date of death after the survey dates!

The International Rescue Committee survey data contained inadmissible ages at deaths, so the Kaplan Meier nonparametric maximum likelihood estimator was, used along with estimators from current status data only.

Tips:

  • Maximum likelihood and least squares estimators differ. We estimated survivor functions from baseline and endline surveys. “MLE” and “LSE” denote maximum likelihood estimation and least squares estimates. They don’t always agree, because the methods are different approaches to estimation. In particular, LSE does not respond to noise. If noise is not uniform across the sample, LSE might be incorrect. The MLE takes noise into consideration. The MLE estimates in the figure are from current status data. They agreed pretty well with the Kaplan-Meier estimators from admissible ages at deaths.

Lessons learned:

  • Survey data is not always what is expected. Surveys should have cross-checking validation opportunities. Current status data provided the opportunity to make two estimates of survivor functions.
  • Expect unexpected outcomes. The baseline U5 estimates are over 10%, and the endline U5 estimate is approximately 4%. Pakistan’s country U5 is 8.7%. The endline U5 estimates standard deviation is less than 0.5%. The apparent reduction in U5 appears to be primarily a reduction in deaths after infant mortality in the first year. Infant mortality was almost 4% before and after.


Resources:

The American Evaluation Association is celebrating Statistics Without Borders Week. The contributions all week come from SWB members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluator.

· · ·

<< Latest posts

Older posts >>

Archives

To top