AEA365 | A Tip-a-Day by and for Evaluators

CAT | Quantitative Methods: Theory and Design

Hello!  My name is Alex (Chi-Keung) Chan and I am currently a Senior Lecturer at the Hong Kong Shue Yan University and a partner of A Data-Driven Consulting Group, LLP based in Minnesota.  Prior to that, I was a Senior Evaluation and Research Fellow at the Minneapolis Public Schools.  I have been adopting the value-added evaluation approach to investigate the effectiveness of various social and educational programs or to identify the beat-the-odds teachers or schools.  Unfortunately, stakeholders always emphasize the summative interpretation of the value-added results without fully understanding the potential formative uses of these measures. So how can we add value to value-added to make it more meaningful and useful?

Hot Tips:

1. Choose value-added measures that are sensitive to the implementation process (e.g. improving teaching) but not just for the accountability outcomes (e.g. determining performance pay).  In other words, a value-added measure is meaningless if it cannot tell whether what a teacher does is really adding value to the learning of student.

2. Link the value-added findings to implementation (e.g. link the value-added scores with the instructional observation scores) to identify best practices that we can learn from (what works) and practices that need to be improved (what doesn’t work).

3. Gather additional evidence to verify the value-added X implementation interaction results (e.g. interview the principals, coaches, and teachers who are part of the professional development program).  Thus, we want to find out the details of what works and what does not work beyond the numbers.

Lessons Learned:    

1.  Collaborate and select value-added measures with the stakeholders.  Sometimes, the opinions of stakeholders could be either too strong or too loose at this initial but critical stage.

2.  Communicate and emphasize the goals and benefits of the formative uses of value-added measures to stakeholders. This is a very challenging process because our focus is too often driven by the limited goal of “accountability” by sacrificing the ultimate goal of “improvement”.

3.  Encourage and understand the different perspectives of various stakeholders in interpreting the findings.  We often miss some voices or ignore some perspectives due to the uneven power structure in a system.  We need to ensure that there is a process in place to hear the voices of powerless people who usually are impacted the most by a program.

Rad Resources:

  1. Value-Added Research Center (VARC) at University of Wisconsin: (Click on the Tutorials tab)
  1. Value-added Measures in Education by Professor Douglas Harris: (An easy-to-read book for the layperson to understand the concepts and applications of value-added)

Hot Tip: Minneapolis/Saint Paul Activities:

My family loves to have a walk at the Sculpture Garden when we were living in Minnesota, and we love the Quang Vietnamese Restaurant at Nicollet Mall.

The American Evaluation Association is celebrating Minnesota Evaluation Association (MN EA) Affiliate Week with our colleagues in the MNEA AEA Affiliate. The contributions all this week to aea365 come from our MNEA members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

My name is Lindsay Demers. I am an evaluator and quantitative methodologist at TERC in Cambridge, MA. I have worked as the lead data analyst on several projects at TERC, Lesley University, Boston College and Brandeis University.

Hot Tip: You must calculate inter-rater reliability!  Calculating inter-rater reliability (IRR) is of the utmost importance when using scoring rubrics to gather data from classroom observations, student assessments…really any data source from which humans are doing the extrapolation. The purpose of calculating inter-rater reliability is manifold, but here is one really important reason: You want clean, accurate, and replicable data. If you have coders who have different understandings of how to apply a scoring rubric, you’re going to end up with error-laden, inaccurate estimates of your variables of interest, both of which can lead to false conclusions.

Hot Tip: IRR calculation should be an ongoing process. Of course, IRR should be calculated at the beginning of the coding process to be sure that coders are starting at an adequate level of agreement. (Here, the “beginning of the coding process” means after coders have undergone extensive training on the coding rubric and appear ready to begin coding independently.) However, IRR should be re-calculated intermittently throughout the coding process to ensure that coders are staying consistent and have not “drifted” from one another.

Hot Tip: Percent agreement does not count as a measure of inter-rater reliability. Neither does a standard correlation coefficient.  Percent agreement is insufficient because it does not take into account agreement due to chance. Coders can have a very high percentage of agreement and still have very low IRR when chance is taken into account. With regard to the correlation, this measure is insufficient because two things can change together without being equal. A high correlation does not necessarily indicate a high level of agreement among coders.

Rad Resource:

  • Handbook of Inter-rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Multiple Raters, 3rd Edition by Kilem L. Gwet (2012)

This contribution is from the aea365 Tip-a-Day Alerts, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org. Want to learn more from Lindsay? She’ll be presenting as part of the Evaluation 2012 Conference Program, October 24-27 in Minneapolis, MN.

Greetings! We are Manuel Voelkle from the Max Planck Institute for Human Development in Berlin (Germany) and Han Oud from Radboud University in Nijmegen (The Netherlands). We are both interested in longitudinal data analysis and are trying to advance statistical methods for the analysis of change, as well as to improve their use and communication.

Hot Tip: Longitudinal data analysis is special!

Consider the following model which postulates an effect of Z (e.g., an intervention) on an outcome variable Y:

This model can be interpreted in two different ways: Either in a descriptive way (i.e., it describes the relationship between two variables and can be used to predict Y from Z) or in a causal way (i.e., Z causes Y). Which of the two interpretations is correct is up to the evaluator to find out by means of clever research designs.

What about longitudinal data analysis? In the figure below, we replaced Z by time. This is what we do when we use multilevel models or latent growth curve models as illustrated by the equation to the right.

It is important to realize that this model always remains at the descriptive level. Time never causes anything!

If we are interested in causal mechanisms, we should aim at models like the one shown below, that allow us to study the relationship between variables as they evolve and unfold over time, without time being a predictor.

Hot Tip: Most real-world processes develop continuously over time!

Although extremely useful and popular in modern evaluation research, the latter class of models suffers from the problem that only the order, but not the actual time intervals are taken into account. This causes problems when time intervals are different. For example if a time interval of 1 week is used in one study, but a time interval of 2 weeks in another study, the parameter estimates will be different, but it is unclear whether this is because the actual process, or the time intervals, differed.

Hot Tip: Use continuous time models!

Put more generally, most phenomena in the real world develop continuously over time and should be modeled as such. For this purpose continuous time models have been developed. In short, instead of predicting X directly, we predict the derivative of X with respect to time. This provides us with parameter estimates that are independent of the discrete time intervals a researcher happens to have chosen, and allows us to compare parameter estimates across studies (or individuals) based on different time intervals.

Rad Resources:

This contribution is from the aea365 Tip-a-Day Alerts, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org. Want to learn more from Manuel and Han? They’ll be presenting as part of the Evaluation 2012 Conference Program, October 24-27 in Minneapolis, MN.

·

Hello aea365. My name is Carlen Erickson and I am a student learning the statistics program R! I want to share two resources I have found invaluable.

Rad Resource: R is a free statistical software package for conducting analyses of quantitative data. It is a free competitor to SAS or SPSS. R can be downloaded from its website here http://www.r-project.org/.

Hot Tip: Anthony Damico has posted 93 “twotorials” – two minute, primarily beginner, tutorials on different aspects of using R. Each is a short screencast with audio. You can find them here http://www.twotorials.com/

Hot Tip: R-Bloggers is a compilation of multiple blogs focusing on using R. It has something for everyone, from the beginner to the expert. They are online here http://www.r-bloggers.com/

 

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

Greetings, I am Cindy Weng, a bio-statistician II employed at Pediatrics Research Enterprise at Department of Pediatrics at the University of Utah. This post was written together with my colleagues Chris Barker, SWB project manager and Larry George, statistician at Problem Solving Tools.

I learned about this methodology through a project assigned by ASA Statistics Without Borders (SWB) in 2011. The goal of this project was to analyze under 5 years (U5) mortality of children before (“baseline”) and after (“endline”) humanitarian aid given at Afghan refugee Camps in Pakistan. Survival analysis was used to estimate the probability distribution of age at death from current status and admissible age-at-death data. Inadmissible ages at death placed the date of death after the survey dates!

The International Rescue Committee survey data contained inadmissible ages at deaths, so the Kaplan Meier nonparametric maximum likelihood estimator was, used along with estimators from current status data only.

Tips:

  • Maximum likelihood and least squares estimators differ. We estimated survivor functions from baseline and endline surveys. “MLE” and “LSE” denote maximum likelihood estimation and least squares estimates. They don’t always agree, because the methods are different approaches to estimation. In particular, LSE does not respond to noise. If noise is not uniform across the sample, LSE might be incorrect. The MLE takes noise into consideration. The MLE estimates in the figure are from current status data. They agreed pretty well with the Kaplan-Meier estimators from admissible ages at deaths.

Lessons learned:

  • Survey data is not always what is expected. Surveys should have cross-checking validation opportunities. Current status data provided the opportunity to make two estimates of survivor functions.
  • Expect unexpected outcomes. The baseline U5 estimates are over 10%, and the endline U5 estimate is approximately 4%. Pakistan’s country U5 is 8.7%. The endline U5 estimates standard deviation is less than 0.5%. The apparent reduction in U5 appears to be primarily a reduction in deaths after infant mortality in the first year. Infant mortality was almost 4% before and after.


Resources:

The American Evaluation Association is celebrating Statistics Without Borders Week. The contributions all week come from SWB members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluator.

· · ·

Hello! We’re Allan Porowski from ICF International and Heather Clawson from Communities In Schools (CIS). We completed a five-year, comprehensive, mixed-method evaluation of CIS, which featured  several study components – including three student-level randomized controlled trials; a school-level quasi-experimental study; eight case studies; a natural variation study to identify what factors distinguished the most successful CIS sites from others; and a benchmarking study to identify what lessons CIS could draw from other youth-serving organizations.  We learned a lot about mixed-method evaluations over the course of this study, and wanted to share a few of those lessons with you.

Lessons Learned:

  • Complex research questions require complex methods. Disconnects exists between research and practice because the fundamental research question in an impact evaluation (i.e., Does the intervention work?) provides little practical utility for practitioners in their daily work. CIS leadership not only wanted to know whether CIS worked, but also how it worked, why it worked, and in what situations it worked so they could engage in evidence-informed decision making. These more nuanced research questions required a mixed methods approach. Moreover, CIS field staff already believed in what they were doing – they wanted to know how to be more effective. Mixed methods approaches are therefore a key prerequisite to capture the nuance and the process evaluation findings desired by practitioners.
  • Practitioners are an ideal source of information for determining how much “evaluation capital” you have. CIS serves nearly 1.3 million youth in 25 states, which opens up the likelihood that different affiliates may be employing different language, processes, and even philosophies about best practice. In working with such a widespread network of affiliates, we saw the need to convene an “Implementation Task Force” of practitioners to help us set parameters around the evaluation. This group met monthly providing incredibly helpful in (a) identifying language commonly used by CIS sites nationwide to include in our surveys, (b) reviewing surveys and ensuring that they were capturing what was “really happening” in CIS schools, and (c) identifying how much “evaluation capital” we had at our disposal (e.g., how long surveys could take before they posed too much burden).
  • The most important message you can convey: “We’re not doing this evaluation to you; we’re doing this evaluation with you.” Although it was incumbent upon us as evaluators to be dispassionate observers, that did not preclude us from engaging the field. Evaluation – and especially mixed-methods evaluation – requires the development of relationships to acquire data, provide assistance, build evaluation capacity, and message findings. As evaluators, we share the desire of practitioners to learn what works. By including practitioners in our Implementation Task Force and our Network Evaluation Advisory Committee, we were able to ensure that we were learning together and that we were working toward a common goal: to make the evaluation’s results useful for CIS staff working directly with students.

Resources:

  • Executive Summary of CIS’s Five-Year National Evaluation
  • Communities In Schools surrounds students with a community of support, empowering them to stay in school and achieve in life. Through a school-based coordinator, CIS connects students and their families to critical community resources, tailored to local needs. Working in nearly 2,700 schools, in the most challenged communities in 25 states and the District of Columbia, Communities In Schools serves nearly 1.26 million young people and their families every year.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· · · · · ·

Hello! My name is Tina Phillips and I am the evaluation program manager at the Cornell Lab of Ornithology. I lead an NSF-funded project called DEVISE (Developing, Validating and Implementing Situated Evaluations), which is aimed at providing practitioners and evaluators with tools to assess individual learning outcomes from citizen science, or public participation in scientific research (PPSR) projects. Within the context of citizen science, we intend to test and validate a suite of instruments across different projects and assess how they perform in different settings. The first thing we did was to assess the state of citizen science evaluations, which formed the basis for a draft framework for assessing learning outcomes. This framework includes six major constructs which comprise common outcomes across diverse projects and include: Interest in science, motivation to participate, knowledge of the nature of science, skills of science inquiry, environmental stewardship behaviors, and science identity.

Lessons Learned: Developing and validating scales is hard! If you’ve done this before, you know what I mean. If you haven’t, don’t underestimate the amount of time it will take to do this well. For instance, prior to developing scales, we conducted an extensive inventory of existing scales that were aligned to our framework and relevant to STEM (Science, engineering, mathematics, and technology) and informal science learning environments. Gathering these scales and the associated literature to document their psychometric properties was labor intensive. Next, as a team, we reviewed and rated each of these scales to determine their contextual relevance to citizen science. From there, we devised a plan for testing or modifying an existing scale, or developing a brand new instrument. For example, one scale is being developed using concept mapping, another is being developed from existing scales; and another is being developed as an item data bank. Once these scales are drafted, they still need to be tested with a variety of audiences and contexts to meet satisfactory validity and reliability criteria.

Hot Tip: Seek the help of psychometricians and others who have developed valid and reliable scales.

Rad resource: Once finalized, the DEVISE toolkit will be openly available via the Citizen Science Toolkit website. This dynamic site is geared towards citizen science practitioners and provides featured projects and a host of resources for working within the citizen science arena.

Rad resource: Another great resource is the Assessment Tools for Informal Science (ATIS) website. The site offers detailed information for over 60 instruments categorized by age, domain, and assessment type. They are currently seeking reviews of instruments by end users.

The American Evaluation Association is celebrating Environmental Program Evaluation Week with our colleagues in AEA’s Environmental Program Evaluation Topical Interest Group. The contributions all this week to aea365 come from our EPE TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

No tags

My name is Staci Wendt and I am a Research Associate at RMC Research in Portland, Oregon. Last year, I completed my Ph.D. in Applied Psychology at Portland State University. After finishing my degree, I was concerned about how to stay current with statistical literature and how to practice techniques that I learned in school, but wasn’t currently using in my work.

Hot Tip - One day, a friend was talking with me about her fiction book club and I had an “Aha!” moment—a book club where we discussed statistics!

Who: We have a small group of people with varying knowledge and experience related to statistics and research methods. Our group is comprised of 6 members, which eases scheduling and allows each of us the opportunity to meaningfully contribute.

When: While our regular meetings are held monthly, we are also available to each other via email throughout the month. The email discussions allow for quick feedback on questions or issues that might arise within our day-to-day work.

What: At our first meeting, we discussed our goals and expectations for the group, brainstormed a list of topics we wanted to discuss, and decided on the format for our group. After this discussion the group decided that in order to make the group both useful and doable we would meet monthly but vary the meeting type. On odd-numbered months, we have formal meetings, where we discuss a pre-determined topic (such as Structural Equation Modeling). We take turns facilitating these formal meetings. The facilitator is responsible for selecting pertinent sub-topics of the theme (e.g., model fit, assumptions of the statistical test, how-to) and assigning them to each member. Each member is then responsible for creating a small “cheat-sheet” on that topic and presenting the information at our meeting. Our presentations are mostly casual in order to encourage a good environment for discussion. We also try to bring pertinent “real-world” examples, either from the literature, or from our own work. On the even-numbered months, we have informal meetings. At these meetings, we bring any specific question or topic that we want to discuss, or review information from the previous meeting. The main difference between the formal and informal meetings is that we don’t have any preparation work for the informal meetings.

Where: We rotate meeting at different group members’ homes for the formal meetings. This allows one person to take notes (which are later distributed to the group) and we have room for reference books. For the informal meetings, we try to meet at restaurants, to add to the relaxed nature of the meeting.

The most important thing is to set group goals, make adjustments as you try it out, and HAVE FUN!

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Greetings. My name is Ricardo Gomez and I currently work as a Research and Evaluation Associate for the National Collegiate Inventors and Innovators Alliance. I am also a doctoral candidate in International Education at the University of Massachusetts Amherst, Center for International Education, and alumnus of the AEA-Duquesne University Graduate Diversity Internship Program.

The opinions of stakeholders are crucial because they can shape the direction of programs and can have an impact on program execution, scalability, and performance. Hence, as a researcher and evaluator, I have always been interested in finding ways to gauge the subjectivity (i.e., opinions, perceptions, attitudes, and motivations) of evaluation participants, and incorporate these into the different phases of my evaluation activities.

Lesson Learned – Q methodology is a powerful tool that evaluators can use to explore the perspectives of evaluation participants. First used and advanced by William Stephenson in the 1930s, Q is a research method that statistically identifies different points of view (or subjectivities) on a given topic based on how individuals sort a set of statements, about that topic.

Traditionally, evaluators have relied on interviews or surveys with Likert-type items to gauge the opinions of evaluation participants. These approaches are not without their drawbacks: the typical outcome of the analysis of Likert-type items is a description of pre-specified independent categories deemed relevant by the evaluator; and interviews can be time consuming and intrusive.

The outcome of a Q study, on the other hand, is a more authentic set of factors that capture people’s attitudes and perspectives about an issue. In Q method, a group of participants (the p-set), sort a sample of items (the q-set), into a subjectively meaningful pattern (the q-sort). Resulting q-sorts are analysed using correlation and factor analysis (q-analysis), yielding a set of factors whose interpretation reveals a set of points-of-view (the f-set).

Rad Resource: Click here www.broaderimpacts.org/aea2011 for an online example of a Q-sort process.

Lesson Learned: Q methodology is an important bridge between qualitative and quantitative methods in that it provides a means for analyzing the phenomenological world of a small number of individuals without sacrificing the power of statistical analysis.

Rad Resource – The International Society for the Scientific Study of Subjectivity (ISSSS) is the official organization committed to the ideas and concepts of Q methodology as enunciated by William Stephenson. ISSSS administers an email discussion list dedicated to exchange of information related to Q Methodology. To learn more about Q methodology, join ISSSS, or become a member of the email discussion list, please visit www.qmethod.org.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea3365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators

Hello, my name is Juan Paulo Ramírez, independent consultant, sole owner of “GIS and Human Dimensions, L.L.C.” How many times have you used spreadsheets or sophisticated statistical software (i.e., SAS, SPSS) to estimate frequencies of a population and you asked yourself: is it really necessary to do this using very expensive and sophisticated software? Or, spending hours and hours cleaning up the data to make it consistent within and between records and variables? Would there be a better and more efficient way to complete these trivial and time consuming tasks? There is, and Google Refine is the answer!

Lessons learned: Google Refine is a free desktop application (not a web-service) that you install on your computer (you can download it here). Google Refine allows users to seamlessly and efficiently calculate frequencies and multi-tabulate data from large datasets (i.e., hundreds of thousands of records), along with cleaning up your data. What I found is that you learn more by trial and error with Google Refine, and discover how easy it is to get the information needed in a few steps. Google Refine has saved me days of hard work! Google Refine works with numeric, time and text data and allows you to directly work with Excel files.

The following are a few examples of how I have used Google Refine: 1) Getting demographic frequencies (i.e., gender, age) and cross tabulating it with economic variables (i.e., income) and location (i.e., county). 2) Cleaning up data that it is inconsistent, since people have sometimes answered questions without any written restrictions (i.e., lengthy responses, spelling error, blank spaces). 3) When you select a date variable, Google Refine creates a bar chart with two ends that you can adjust, dragging them with your mouse to define specific time periods. 4) If you make a mistake, Google Refine allows you to undo everything you have done!

Rad resource: There are three videos available that show the potential applications of Google Refine. You can watch them here. I watched the first video once and it was enough to convince me that this was a must have application. I started using it right away, and it became one of the most essential tools that now I use in my work.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· ·

<< Latest posts

Older posts >>

Archives

To top