AEA365 | A Tip-a-Day by and for Evaluators

CAT | Quantitative Methods: Theory and Design

My name is Kai Chi Yam, and I am a graduate student at Washington State University. I recently had an opportunity to evaluate a university-based mentoring program. Exploratory Factor Analysis (EFA) was used to provide score validity evidence through the examination of internal structure of study-specific measures. Such evidence can increase the utility of the instruments in evaluation, and ultimately increase the creditability and utility of evaluation results. While working with my colleagues, I found that some evaluators may be familiar with EFA, but most follow software default settings with little consideration of different procedures that can affect the results of EFA. Here are some general guidelines for appropriate use of EFA.

Hot Tip: Follow these steps when conducting an EFA:

1. Data screening: Check Chi Square (p > .05), Kaiser-Meyer-Olkin measure of sampling adequacy (MSA > .70), Bartlett’s test of sphericity, univariate and multivariate normality to determine if the data is appropriate for factor analytic procedures.

2. Extraction methods: Use principal component analysis if the rationale of EFA is purely data reduction. Use principal axis factoring if the rationale of EFA is to extract latent variables.

Note: Principal axis factoring is preferred because it takes measurement error into account.

3. Rotation methods: Use direct oblimin or promax rotation when factors are non-orthogonal. Use varimax rotation when factors are orthogonal.

Note: In most evaluations, especially when the measurements are psychological in nature, evaluators should assume factors to be non-orthogonal and proceed with direct oblimin or promax rotation.

4. Criteria for retaining factors: Decide the number of factors based on at least two criteria (e.g., scree plot, parallel analysis, conceptual meaningfulness of the factor) other than Kaiser’s rule of eigenvalue > 1.

5. Interpretation: Name each factor with theoretical (i.e., supporting theories) and statistical justifications (i.e., factor loadings). A general rule of thumb for acceptable factor loading is .40 or above.

Note: All interpretations must be theoretically justifiable! Don’t base your judgment solely on factor loadings.

Rad Resource: Making Sense of Factor Analysis: The Use of Factor Analysis for Instrument Development in Health Care Research. (ISBN: 0761919503) This introductory textbook is easy to follow and requires minimum knowledge in statistics and math.

Exploratory factor analysis can be a useful tool in evaluation when study-specific measures are employed. Please note that these are general guidelines, not definitive rules. Evaluators should consult with methodologists, textbooks, or journal articles before attempting EFA.

Want to learn more about Kat Chi’s work using EFA? Join us at the American Evaluation Association’s Annual Conference, Evaluation 2010, in San Antonio this November and check out the poster exhibition on Wednesday evening.

, ,

My name is Jane Davidson and I run an evaluation consulting business called Real Evaluation Ltd. In my work, I advise and support organizations on strategic evaluation; provide evaluation capacity building and professional development; develop tools and templates to help organizations conduct, interpret, and use evaluations themselves; and conduct independent and collaborative evaluations and meta-evaluations.

Over several years’ working with clients and reviewing (at clients’ request) disappointing evaluation reports, I have noticed several critically important elements that make or break evaluation work but are often missing from evaluators’ methodological toolkits.

Hot tip: Clients find it incredibly frustrating to wade through an evaluation report full of evidence and still be none the wiser at the end whether the documented outcomes (let alone the entire program/policy/etc) are any good or not. A key part of an evaluator’s work is to say clearly and explicitly how practically, educationally, socially, or economically (not just statistically) significant outcomes are (severally, and as a set). This is what makes evaluation ‘e-VALU-ation’!

Hot tip: A useful tool for generating real evaluative conclusions is an evaluative rubric. This is a table describing what different levels of performance, value, or effectiveness ‘look like’ in terms of the mix of evidence on each criterion. Grading rubrics have been used for many years in student assessment. Evaluative rubrics make transparent how quality and value are defined and applied. I sometimes refer to rubrics as the antidote to both ‘Rorschach inkblot’ (“You work it out”) and ‘divine judgment’ (“I looked upon it and saw that it was good”)-type evaluations.

Hot tip: Collaborative development of rubrics is a great way to get stakeholders thinking about how ‘quality’ and ‘value’ should be defined for the work they do. It helps build the evaluative thinking needed to generate, understand, accept, and use evaluation findings.

Rad resources:

This contribution is from the aea365 Daily Tips blog, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org. Want to learn more from Jane? She’ll be presenting as part of the Evaluation 2010 Conference Program, November 10-13 in San Antonio.

,

My name is Susan Kistler. I am the Executive Director of the American Evaluation Association and I contribute each Saturday’s aea365 post.

Resource: Have you ever wanted to hear “riveting talks by remarkable people?” That is the tagline for TED Talks, brought to you by TED, a nonprofit dedicated to “ideas worth spreading.”  TED hosts conferences with some of the world’s best, and most provocative, speakers and then posts those speeches on the web for the world to see for free. Al Gore? Regular speaker. Bill Clinton?  Bill Gates? Jane Goodall? Amy Tan? Yes. Yes. Yes. Yes.

So what does this have to do with evaluation? TED Talks can help you to think ‘out of the box,’ to explore the intersection of art and ideas, to ponder profound issues on which evaluation can have a mitigating effect, to examine our assumptions, and to refine and expand methodologies. Here are three that I have found particularly compelling and that I believe share ideas that can impact practice. Each is free via the link provided.

  • Hans Rosling, a Swedish professor of global health, is one of the most repeatedly invited speakers at TED. He has six presentations to date and is one of my favorites. Why? His talks on “The best stats you’ve ever seen,” “Let my dataset change your mindset” and “New insights on poverty” expanded my understanding not only of global health issues, but also regarding how we can convey data so that people will listen and care about what is being said. http://www.ted.com/speakers/hans_rosling.html
  • Sheena Iyengar studies how people choose, examining the ways in which personal history, cultural norms, and contextual factors impact ‘free choice’ and even how the concept of free choice is culturally laden. Her TED talk prompted me to purchase her book, The Art of Choosing, and also think more deeply about how choice – and our assumptions about choice – influence evaluation. http://www.ted.com/speakers/sheena_iyengar.html
  • Anna Deavere Smith is an actress, playwright, researcher and storyteller. In search of the American character, she interviews people from across the United States and performs excerpts from those interviews – in the interviewee’s own voice and using their words verbatim. Her commentary – on race, equity, justice, optimism – reflected in her TED talk, gains its gravitas and urgency from sharing the authentic voice of the words of stakeholders. http://www.ted.com/speakers/anna_deavere_smith.html

Hot Tip: I saved the best for last. Anna Deavere Smith is going to give the opening keynote at Evaluation 2010 this November in San Antonio where you can learn more from her and over 500 other speakers. Why? Because evaluation is an idea worth spreading.

,

Hi, my name is Christopher Moore.  I am a doctoral student in Quantitative Methods in Education at the University of Minnesota and a Quantitative Analyst at the Minnesota Department of Education.  My interests include preventing educational and health disparities, latent variable models, spatial statistical methods, and causal theory and inference.

Hot Tip: So you’re conducting a theory-driven program evaluation?  You’ve developed a solid logic model, you’ve collected relevant quantitative data, and now you’re interested in estimating the degree to which the program has been effective?  Structural equation modeling is a statistical approach that is well-suited for estimating relationships specified by a logic model.

As described by Paul Mattessich in The Manager’s Guide to Program Evaluation, logic models feature program elements and paths from causal elements to outcomes.  Elements in the middle represent both causes and outcomes, mediating the influence of inputs on longer-term outcomes.  Theory-driven evaluators like to pull mediators out of the “black box.”

Figure 1. Elements of a logic model

In the analysis phase of a theory-driven evaluation, structural equation modeling can simultaneously operationalize elements as latent factors and estimate multiple causal paths.  It does so by modeling the observed covariance matrix.  If the data contain dichotomous or ordinal dependent variables, then a polychoric correlation matrix should be modeled.  A sequential strategy (e.g., scaling followed by regression analysis for each dependent variable) requires more steps and can underestimate causal paths by not accounting for measurement error.

A logic model can be adapted into a structural equation model path diagram (see Figure 2).  Observed variables are represented by rectangles, and latent variables are represented by ellipses.  For simplicity, the example below features no error terms and only one input, activity, output, and outcome.  The outcomes are treated as latent variables reflected by repeatedly observed indicators (e.g., survey questions).  The intercept and slope capture initial status and change over time, respectively.

Figure 2. A partial mediation growth model adapted from a logic model

Moving to a real-world scenario in which structural equation modeling could be applied, Kathryn Tout and colleagues at Child Trends have identified a need for theory-driven evaluations of child care Quality Rating Systems (QRS).  QRS represent a relatively new approach to helping parents choose high quality child care, which is believed to promote child development.  Using Tout and colleagues’ article as a guide, I developed a path diagram that could be estimated with data being collected by QRS evaluators.  The actual path diagram would have more inputs, outputs, and item scores.

Figure 3. A path diagram for evaluating a child care Quality Rating System

Structural equation modeling requires familiarity with matrix algebra and formal training in latent variable models and related software.  Melanie Wall, David Garson, and Alan Reifman have created helpful course web pages.  Amos is a popular add-on to SPSS that can specify structural equation models by drawing path diagrams.  Mplus is another popular program and my favorite because it can handle multilevel, categorical data sampled in a complex manner (i.e., with unequal probabilities of selection), although it does not produce path diagrams.  The sem package in R is free and another favorite of mine.  When using Mplus or the sem package, Graphviz can be used to create path diagrams, as I did above.

I hope this “tip” has encouraged you to at least consider structural equation modeling during the data collection and analysis phases of a theory-driven evaluation.  Even though evaluators skillfully develop theories of change that recognize multiple causes and outcomes inside the “black box,” a search of evaluation publications suggests that structural equation modeling could be utilized more fully.

This contribution is from the aea365 Daily Tips blog, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org.

, , ,

Hi, I’m Lihshing Leigh Wang, and I’m an Associate Professor of Psychometrics and Quantitative Methodology at University of Cincinnati. Today I want to share with you a tip about database design.

Evaluation research that involves large-scale, multi-level, multi-year, and multi-cohort data presents special challenges to evaluators. Most training programs and publication venues focus on the research design, data collection, and data analysis phases, but largely leave the database design phase out of the research cycle. This knowledge gap presents special obstacles in today’s climate that encourages interdisciplinary collaboration and systems integration to inform scientific discovery and policy decision making. Database design and management is being recognized as one of the priority research areas in the near future by many funding agencies, such as the National Science and Technology Council, National Institute of Health, and Institute of Education Sciences.

Hot Tip: A powerful web-based platform that supports complex database design and multi-site collaboration is SAS Enterprise Guide (http://www.sas.com/technologies/bi/query_reporting/guide/). Its graphical user interface provides visualization tools that users can easily navigate from multiple remote sites. Its centralized data repository warehouse collects data from distributed locations and controls data security in a hierarchical command structure. Its integrated analytics system provides seamless information flow in a shared framework that maximizes data transportability and minimizes data processing errors.

In a recent state-wide endeavor to examine teacher preparation accountability, we explored the causal relationships among three clusters of variables: one exogenous cluster (teacher education), one direct endogenous cluster (teacher quality), and one indirect endogenous cluster (student learning). The two endogenous clusters were repeated over seven years and collected from six cohorts at more than seventy sites. We used SAS EG as the shared platform for collaboration. The biggest challenge we encountered was political rather than technical—issues such as ownership of data collected through local sites but centrally deposited at the data warehouse. Another challenge we faced was linking multiple relational databases with unique identifiers, which again was a design issue rather than technical issue. Without a web-based platform such as SAS EG, conducting evaluation research on such a complex scale would be unimaginable.

,

Hi, we are Mende Davis (assistant research professor) and Mei-kuang Chen (advanced graduate student) in the department of Psychology at the University of Arizona. We are also members of the Evaluation Group for Analysis of Data (EGAD) led by Lee Sechrest. G*power is a useful tool to estimate minimum sample size or possible power of a potential study. In our own work as researchers applying for numerous grants, G*power has been a handy tool.

Hot Tip: An evaluation without enough cases may not be able to answer the research questions. You don’t want to be in the position of telling stakeholders that the study was only powered to detect a real difference 15% of the time. Power analysis is used to estimate the number of cases needed to detect a true difference if it exists. Three things are needed for a run-of-the-mill power analysis; alpha (probability of committing a type I error, i.e., rejecting a “true” null hypothesis), beta (probability of committing a type II error, i.e., accepting a “false” null hypothesis), and the expected effect size. Alpha is the familiar ‘type I error rate’ that is often set at .05 (p=.05, meaning you are willing to accept a false positive one time out of twenty). Beta is related to the value of statistical power (power= 1- ?) that you select, which is often set at .80. This means you want to be able to detect a real difference 80% of the time. The effect size is the strength of the relationship between two variables (e.g., the amount of change you expect in your outcome variable). Effect sizes are usually reported in standardized units, such as r, f2, or odds-ratios. Pilot studies and the literature can help us make an educated guess about the effect size. Checking the literature for an effect size can be a real eye opener. With the statistical analysis to be used (e.g., t-test, or regression equation), plus the levels of ?, ?, and the estimated effect size at hand, you can use G*power to estimate the minimum sample size. If you have the information about the available sample size instead of the effect size, G*power can tell you how much statistical power you would have in your study.

Rad Source: G*power is free. Where can you get G*power? The newest G*power 3 can be obtained at http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/ and the G*Power 2 manual will still be useful for using G*power 3. It can be found at http://bit.ly/GPower2Manual.

Rad Source: The calculating of power and required sample size depends on which statistical tool you will use in your study. Some knowledge about power analysis will be helpful for evaluators: http://www.statsoft.com/textbook/power-analysis/

References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd edition). Hillsdale, NJ: Erlbaum.

, , ,

My name is Nina Potter and I am currently the Director of Assessment for the College of Education at San Diego State University. I’d like to share a little about a tool we are using for data visualization.

One of my responsibilities is to work with program directors and department chairs to evaluate academic programs across the college’s eight departments and 30+ programs. Our programs vary greatly in size and each has its own goals and student learning outcomes. Plus, we have some common goals across the college. We wanted to have a common tool that would allow us to share data across the college, but it had to be very flexible in terms of the kinds of data that it could handle as well as the kinds of reports that it could generate. After a lot of exploring, we chose Tableau.

Rad Resource: Before coming to SDSU, I had never heard of Tableau, in fact I had not heard the term “data visualization tool.” First I will tell you what it is NOT. Tableau is not a tool for data entry. You use Tableau to access data from other data sources such as spreadsheets or databases. This was important because our programs use many different tools to collect data, from electronic portfolio systems to paper and pencil tracking (we do require them to at least put the data in a spreadsheet). And, Tableau does not do advanced statistics; although it does do simple regression and t-tests. For statistical tests, we still use other statistic packages.

So what does Tableau do? Tableau allows you to link into multiple data sources, and quickly and easily create interactive graphs and charts that are updated in real time as your data sources are updated. It has a variety of choices for visualizations such as tables, line graphs, bar charts, pie charts and geographical maps. With just a few clicks you can easily change the type of chart, add colors, add filters and drill down to data that fits certain criteria. The charts are interactive so that anyone viewing the charts can apply filters and view the data they want to focus on.

For example, we have some assessments that are given across multiple programs. We can create a chart that looks at student progress over time and add filters such as program, gender, ethnicity, and age. A person who is evaluating the program as a whole can compare the results from program X to program Y to see if there is equity across multiple demographic groups. Additionally, a person who is working with individual students can download a list of students who have failed more than one assessment in a given program.

Want to hear more about Tableau from Nina? Join her on April 29 for “Data in, Brilliance Out with Tableau” as part of AEA’s Coffee Break Demonstration Series. More information and registration may be found at http://comm.eval.org/EVAL/coffee_break_webinars/Home/Default.aspx. Free for AEA members!

, ,

Hello! I am Maryann Durland and I own an independent consulting firm, Durland Consulting and have been using Network Analysis (NA) since the early 1990’s and hence my focus on our evaluation network. My post today is about the methodology of NA in evaluation applications.  First what is it, second how do you do it, and three what does it look like.

Hot Tip: Network analysis is the methodology for studying relationships among and between members of a set(s).  A set can be people, references, roads and towns, organizations, and so on. Relationships defined for a set can be at three levels –  individuals, subgroups or the whole set, and from a variety of contexts, such as friendships, co-membership in groups, related to and how, work with, readers of the same book, etc. To apply network analysis requires three components:

  1. Define the Network and the Relationship: In some applications the network is self defining – members of an extended family.
  2. Measures Used to Analyze the Data: The choice of measures is usually based on a theory about the relationship.
  3. The Sociogram: The sociogram illustrates the network and also allows us to see the position of individuals within the network to further understand the analysis and which may also indicate further data analysis.
generation shape color
great circle orange
grand square pink
parent triangle blue
child box x green

As an example, in family money exchanges, the data might indicate that dad loans to more people, but daughter 2 is engaged in larger loans. In the sociogram we see how location is important.  The sociogram nodes’ size is related to the amount of money and shape and color are by generation.


We see that dad and mom have similar locations in the network and, except for son 2, they connect to different individuals, suggesting further analysis. Though daughter 2 is connected less as measured by outdegree, she is involved in larger loan amounts. Son 2 is in a pivotal location in the network, bridging mom and dad’s subgroups, indicating that subgroup membership might be another level of analysis.

This provides a small glimpse into network analysis. What I like about network analysis is that it forces you to focus on how we assume behavior will play out in our initiatives like what do we assume mentors will do in a mentoring relationship? And it allows us to explore the complexity of our programs and initiatives.  It requires thinking about the systems within which our initiatives are situated and it is fun.

Rad Resource: Want to learn more about Social Network Analysis? An introductory text is available online at http://ow.ly/1rlcQ

This contribution is from the aea365 Daily Tips blog, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org.

family borrowing3.jpg

, , ,

Greetings colleagues!  I am Elizabeth Harris, Ph.D., Vice President of Evaluation, Management and Training Associates, Inc. (EMT).  The focus of this blog is a free resource for measuring youth resiliency that we developed out of necessity.  For over 25 years, we have focused our evaluation, technical assistance, and training work on the prevention of substance abuse and other behavioral health needs; on policies and programs promoting the positive social-emotional and behavioral development of children and youth; on family service needs; and on related fields of public health.

The evaluation of youth initiatives represents a critical aspect of our work.  Unfortunately, as we sought to evaluate one of the largest national federally-funded youth initiatives, available instruments reflected the prevailing thinking that the only aspects worth measuring were attitudes of despair and hopelessness and illegal and risky behaviors.   Available measures did not reflect the reality of programmatic objectives at the local level, many of whom provided activities grounded in youth development theory, seeking to strengthen existing assets.

Resource:  In order to honor the evaluation tradition of measuring what programs actually intend to impact, we developed an instrument of youth resiliency, the Individual Protective Factors Index (IPFI).  The IPFI is a 71-item questionnaire which provides a single measure that captures the various protective factors that have been identified as contributing to individual resiliency in youth between the ages of 10-16 years who may be at risk for developing substance use and other problems. For the national study, we combined the IPFI with federal GPRA measures in order to also measure the risky behaviors and attitudes that were required by the funding agency.

The instrument has been used extensively and is the product of extensive conceptual development and empirical testing, including norming and validation studies on 2,416 youth in 15 states nationwide.

The IPFI is available free of charge to our colleagues and can be downloaded at http://www.emt.org/ipfi.html.  A Spanish language version is also available.  Our only request is that you share the results of your evaluation with us.

This contribution is from the aea365 Daily Tips blog, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org.

,

We are P. Antonio Olmos-Gallo, Kathryn K. DeRoche, and C.J. McKinney.  We work at the Mental Health Center of Denver, a non-profit community mental health center which has become the de-facto mental health authority for the City and County of Denver. On any given day we provide services to about 4000 adults and 1000 youth. In addition to our duties associated with the collection and analysis of outcomes for the center, we also provide evaluation services to the multitude of Federal and local grants our center receives every year, which include not only treatment, but also prevention services to a multitude of individuals across Colorado. We have made a very concerted effort to involve multiple stakeholders in every evaluation we conduct: that includes youth, parents, adult consumers, and clinical and manager-level individuals.

In the last 5-7 years, a big part of our work has concentrated in the development of instruments to measure recovery from mental illness. Although we have degrees in psychology (either MA or Ph.D.), our training is not in clinical psychology, therefore we rely heavily on the expertise of multiple people for clinical interpretation of the data. We also teach graduate and undergraduate statistics and experimental methods at different colleges and universities in the State of Colorado. We believe this unique combination provides us with an edge when it comes to doing evaluation.

Hot tip: Do not short-change your evaluation efforts by trying to use techniques/tools that may not fully answer your questions. In our private practice, we sometimes have to step in to evaluate programs that never managed to answer the key questions because the techniques were not the most appropriate. Evaluators are sometimes afraid to use anything more sophisticated than a t-test or a chi-square, because “stakeholders do not understand statistics”. This takes us to our next hot tip:

Hot tip: Despite what you and your stakeholders may think, they can understand very sophisticated evaluation concepts if you give them enough background and there is willingness to learn (and to teach). During the last 5 years, our stakeholders have learned about Logic models, instrument reliability and validity, Item Response Theory (Rasch models), Hierarchical Linear models, Cost-benefit analysis, and more recently, Quality control charts. They may not believe it, and may not even accept it in public, but they are able to understand when an instrument is not working (and the importance of that), understand the power of predictive models, and the importance of using the right tools for improving day to day operations. More importantly, they also understand the limitations of some of these tools.

In 2009, we shared at AEA several examples of how we have managed to explain our stakeholders sophisticated concepts in evaluation and statistics in very intuitive ways. Please visit our website (http://www.outcomesmhcd.com/pubs.htm) to see that and many other examples of our work in evaluation of mental health.

This contribution is from the aea365 Daily Tips blog, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org.

,

Sponsored by the American Evaluation Association

Theme Design by devolux.org

Archives

To top