AEA365 | A Tip-a-Day by and for Evaluators

TAG | data analysis

Hello! I’m Judy Savageau from the Center for Health Policy and Research at UMass Medical School following up on yesterday’s post with Part II of basic data analyses. A number of posts outlining statistical/analytic details are in AEA365’s archives. For example, there are some great posts on “Readings for Numbers People (or Those Who Wish They Were)”, “Starting a Statistics [Book] Club”, and “Explaining Statistical Significance”. These posts discuss multivariate modeling, longitudinal data analysis, propensity score matching, factor analysis, structural equation modeling, and more. But what defines multivariate analyses and how do they differ from bivariate analyses?

Hot Tip:

Decisions about bivariate statistics (i.e., assessing the relationship between 2 variables; e.g., gender and school performance) are made based on the ‘type’ of data (e.g., categorical vs continuous; see yesterday’s Part I post). There are many reputable resources that show simple tables for determining which statistic to use (see Rad Resources below), including:

  • Chi-square test: 2 categorical variables (e.g., program participation: yes/no and job type)
  • T-test: 1 categorical variable with 2 levels (e.g., gender: male/female) and 1 continuous variable (e.g., IQ, SAT scores)
  • ANOVA – Analysis of Variance: 1 categorical variable with 3 or more levels (e.g., program performance: low / moderate / high) and 1 continuous variable (e.g., years of education)
  • Correlation coefficient: 2 continuous variables (e.g., years of employment and number of correct responses to knowledge about job-related standards)

Hot Tip:

Finally, use multivariate analyses when you want to look at a large number of variables and their relationship (collectively) to one outcome. The most appropriate multivariate statistic depends, in large part, on the categorical or continuous nature of the outcome variable. For example, in one federally-funded study assessing the multiple factors related to return to work after a work-related injury (e.g., severity of injury, years until anticipated retirement, pre-injury job satisfaction, employer assessment of re-injury potential, etc.), our outcome variable was ‘return to work’ measured in multiple ways:

  • Categorical measure: return to work – Yes/No. To determine which factors are most predictive of whether or not a person with a work-related injury will come back to work might best be explored using logistic regression.
  • Continuous measure: how quickly (in weeks) might a person return to work following a work-related injury might best be explored using linear regression.

There are many decisions to be made when developing a data analysis plan. I’m hoping that this 2-part introduction to the basics of statistical analyses gets you started in thinking about the best way to explore and analyze your quantitative data. Of course, having a statistician/data analyst sitting ‘at the table’ with the team as early as possible will ensure that you collect data in the best format to answer your research questions.

Rad Resources:

Here are just a couple of web pages that help with some decision-making about when it’s most appropriate to choose one statistical test over another – depending on the type of data you have.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Hello! I’m Judy Savageau from the Center for Health Policy and Research  at UMass Medical School. A recent post from Pei-Pei Lei, my colleague in our Office of Survey Research, introduced some options for statistical programming in R. I wondered whether a basic introduction to statistics might be in order for those contemplating ‘where do I begin’, ‘what statistics do I need to compute’, and ‘how do I choose the appropriate statistical test’. While most AEA365 blogs don’t cover every topic in detail, perhaps a basic 2-part introduction will help here. Analyses are very different with qualitative versus quantitative data; thus, I’ve concentrated on the quantitative side of statistical computations.

Hot Tip:

Analyses fall into 3 general categories: descriptive, bivariate, and multivariate; they’re typically computed in that order as we:

  • explore our data (descriptive analyses) with frequencies, percentile distributions, means, medians, and other measures of ‘central tendency’;
  • begin to look at associations between an independent variable (e.g., age, gender, level of education) and an outcome variable (e.g., knowledge, attitudes, skills; bivariate analyses); and
  • try to identify a set of factors that might be most ‘predictive’ of the outcome of interest (multivariate analyses).

Hot Tip:

The decision about what statistical test to use to describe data and its various relationships depends on the ‘nature’ of the data. Is it:

  • Categorical data:
    • nominal; e.g., gender, race, ethnicity, smoking status, participation in a program: yes/no;
    • ordinal: e.g., a Likert-type scale score of 1=Strongly disagree to 5=Strongly agree or 5 levels of education: ‘Less than high school’, ‘High school graduate/GED’, ‘Some college/Associate degree’, ‘College graduate – 4-year program’, and ‘Post-graduate (Masters or PhD degree)’;
    • interval: ordinal data in fixed/equal-sized categories; e.g., age groups in 10-year intervals or salary in $25,000 intervals; or is it:
  • Continuous data:
    • For example: age, years of education, days of school missed due to asthma exacerbations), etc.

Of course, data are often collected in one mode and then ‘collapsed’ for particular analyses (e.g., age recoded into meaningful age groups, Likert-type scales recoded as ‘agree’/’neutral’/ ’disagree’).

Hot Tip:

Decisions must take into consideration whether the data are ‘normally distributed’ (i.e., is there ‘skewness’ in the data such that the values for age are mostly in persons under 45 though you have a small number of people who are in their 60’s, 70’s, and 80’s?). Most statistical tests have a number of underlying assumptions that one must meet – all starting with data being normally distributed. Thus, one typically begins looking descriptively at their data: frequencies and percentile distributions, means, medians, and standard deviations. Sometimes, graphing the data shows the ‘devil in the detail’ with regard to how data are distributed. There are some statistics one can compute to measure the degree of skewness in the data and whether distributions are significantly different from ‘normal’. And, if the data are not normally distributed, there are several non-parametric statistics that can be computed to take this into account.

Tomorrow’s post will focus on bivariate and multivariable statistics. Stay tuned!

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Hi! I’m Laura Sefton, Project Analyst in the University of Massachusetts Medical School’s Center for Health Policy and Research. Many AEA365 readers use sophisticated statistical software like SPSS, SAS, or Stata for their data analytic needs, but others don’t have ready access to such software and depend on more freely available options like Microsoft Excel. Excel is built to do many computational functions, but it can take time to research its help pages to get it to do what you want. I found a few functions helpful for summarizing both numeric and text-based data that my colleagues and I collected during a recent set of over 30 interviews with Medicaid members. Using the sample data below, I’d like to share what I’ve learned with you.

sefton-1

Computing means, modes, and medians: To calculate the mean of the number of ER (Emergency Room) visits, the formula instructs Excel to ‘average’ data in a range of cells. Using the sample data above, the formula below computes a value of 4.71 for the data in column A. To compute the median or mode, substitute ‘median’ or ‘mode’ for ‘average’.

=AVERAGE(A2:A8)

Counting text data: Excel can also count up the frequency of text responses. The formula is structured much like computing a mean, with the instruction to ‘count if’ certain criteria are met in a set of cells. The formula below asks Excel to count how many times the word ‘Physician’ appears in rows 2 through 8 in column C. Note the use of double quotes around the criteria. You will have to create a calculation for each separate criteria, but you can easily copy/paste and update the formula.

=COUNTIF(C2:C8, “Physician”)

Counting data grouped in ranges: We often report data in ranges and Excel can easily do this using the ‘countif’ function. Your formula needs to account for 2 criteria: the minimum and maximum numbers in the range. Note that you’ll need to add an ‘S’ to the ‘countif’ command since there are since 2 ranges are being compared.

=COUNTIFS(B2:B8,”>=19″,B2:B8,”<=25″)

Hot Tip: Create, format, and populate your summary data display in Excel. The formulas will work behind the scenes to make updates if you make any changes to your source data.

sefton-2

Hot Tip: Right click on status bar at the bottom of your Excel page and select calculations that you’d like to see on the bar. Excel will automatically compute and display the average, minimum, maximum, count and sum, among other options, for a set of highlighted cells.

Rad Resource: Beyond searching the Web using keywords, a good resource is Microsoft Office Support, which provides an Excel Help page, a descriptive list of Excel’s statistical functions and some tutorials.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

·

We are Lisa Holliday and Olivia Stevenson. We are data architects with The Evaluation Group where we have recently begun to transition to R for data analysis.  R is a free program for statistical analysis that is powerful, but can have a steep learning curve if you want to utilize its scripting capabilities.

Is R worth it for evaluators?

In short, yes! R makes advanced analyses, such as propensity score models, social network analyses, and hierarchical linear models possible with just a few lines of code. R opens possibilities with data visualizations that are highly customizable while maintaining fast reproducibility. R is an essential tool for all evaluators especially as the demand on evaluators for more rigorous designs and analyses continue to grow.

Where can I start?

The Rcmdr package offers a point and click interface that makes R much more user-friendly.  Not only is it easy to install, but it also generates R scripts, which can be saved, modified, and re-run. This can be a big time-saver!

Cool Trick 1:  The Latest Version of R

To get started, you will need to install the latest version of R, which can be found here.  The most recently released version will appear in the “News” feed.  You should also install RStudio Desktop.  RStudio is an integrated development environment (IDE) that makes working with R and installing R packages quick and easy.  When you want to use R, open RStudio to get started.

Cool Trick 2: Installing Rcmdr

Within RStudio, select “Install” from the “Packages” pane in the lower right hand corner of your screen.  In the pop-up window, enter “Rcmdr” and select “Install.”

(*click on image to see larger)

 

rcmdr

Cool Trick 3: Using Rcmdr

Once you have installed Rcmdr, all you will need to do is select it from the “Packages” pane.  While it will open automatically, the first time you use it, you may receive a message that you need to download additional packages.  If this happens, approve the installations, and Rcmdr will open in a new window when this process is complete.

(*click on image to see larger)

opening_rcmdr

Rad Resources: Rcmdr Training

There are a lot of great resources available on how to get started with Rcmdr.  Here is a brief introduction to Rcmdr that includes how to import data.  A good introduction to using RStudio (and R in general) is Lynda’s “Getting Started with the R Environment,” which you may be able to view for free through your public library.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

·

My name is Spectra Myers, and I am a graduate student at the University of Minnesota’s Organizational Leadership Policy and Development working on a Masters in Evaluation Studies.

I have been working closely with a Minneapolis agency addressing homelessness among youth on a service evaluation. The project included fielding a paper survey, data entry and analysis. The agency was thrilled with the actionable information generated and program changes suggested by staff as a result of a survey. They committed to fielding the survey twice a year to track their progress and generate further insights for program improvements. The only problem: their staff does not have training in, or access to, programs like SPSS or R to generate the descriptive statistics needed for analysis. Even Excel seemed too cumbersome for their needs.

Rad Resource: Statwing is an online subscription data analysis program with straightforward data uploading and intuitive features. It automatically codes missing data; generates descriptive statistics; notes p-values, effect sizes, confidence intervals; and it even includes basic regression. They have a free 14-day trial and offer monthly and annual plans at www.statwing.com.

Hot Tip: Want to share data with collaborators or clients? Statwing makes it easy to generate a link that you can share to provide read-only access, regardless of whether others have an account.

Lesson Learned: Commit to supporting your clients learning process in data analysis. Just because Statwing is intuitive doesn’t mean you’re off the hook. It still takes some knowledge of statistics to know when or when not to use the provided features including the suggested statistical analyses and visualizations. I’m taking the approach of analyzing the second round of data with agency staff to ensure a successful transition.

·

My name is Lisa R. Holliday, and I am a Data Architect with The Evaluation Group. Cleaning data is a large part of my job.  For studies that need to match participants’ names or IDs, it is critical to ensure that information is entered consistently from one data collection cycle to the next. However, depending on the data collection tools available, it’s not always possible to do this. Cleaning and matching data can be time-consuming.

I’ve found two useful tools that have helped to reduce the amount of time I spend working with these data and increased matching accuracy.

Rad Resource 1: FuzzyMatch Macro for Excel

FuzzyMatch is a free Excel macro available from the Mr.Excel message board. It creates three functions:

  1. FuzzyVlookup
  2. FuzzyHLookup
  3. FuzzyPercent

FuzzyVlookup and FuzzyHlookup look up values against a master list and suggest possible matches. FuzzyPercent allows you to assess the similarities between the proposed matches and the master list. In my testing, FuzzyMatch was able to correctly match values that differed by as many as nineteen characters in length.

One drawback to this macro is that it can be time-consuming to run. During testing, it took Excel approximately one minute to process 942 rows using FuzzyVlookup and FuzzyPercent concurrently. Additionally, the functions have volatile characteristics (they sometimes recalculate when you save the workbook). To avoid this, after running the functions, copy and paste the results as text.

This video demonstrates how to use FuzzyMatch.

Rad Resource 2: iugum Data Matching Software

This software performs much like the Excel macro, except it is faster and more user-friendly. There is a 14-day free trial available, and subscriptions run from $99 to $400. During testing, it matched over 8,000 rows of data in less than one minute. If unable to perform an exact match, iugum provides possible matches, and it is easy to accept or reject suggested matches. If you regularly clean data using a master list, this is a valuable tool.

This video provides an overview of iugum Data Matching Software.

We’re celebrating 2-for-1 Week here at aea365. With tremendous interest in the blog lately, we’ve had many authors eager to share their evaluation wisdom, so for one special week, readers will be treated to two blog posts per day! Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

 

 

· ·

Greetings, I am June Gothberg, Lead Curator for aea365 and Research Associate at Western Michigan University. As Lead Curator, I am always looking for ways to expand the knowledge of evaluators through hot tips, cool tricks, lessons learned, and rad resources. While working on my dissertation, I made a great find and thought I would share it with you. I was looking for a way to measure my variables measuring conversations between participants. In my search of the literature, I ran across the Linguistic Inquiry and Word Count (LIWC) software and discovered it’s multitude of uses.

Language Inquiry and Word Count Software

LIWC is a computerized text analysis program with Mac and Window versions. It calculates the degree to which people use different categories of words across texts, including emails, speeches, poems, or transcribed daily speech. A few of the most interesting include positive or negative emotions, self-references, causal words, as well as 70 other language dimensions. A new area in which LIWC is being used is social network analysis.

Lessons Learned:

  • Don’t be afraid to go outside your field. For example, the roots of modern text analysis are found in the field of psychology.
  • In general, LIWC categorizes words hierarchically. For example, insight is a subgroup of cognitive processes and anger is a subgroup of negative emotions. So, you must decide what level to measure.
  • LIWC offers a nice triangulation for analyzing data. It helped validate the rater/coder findings of my study in an unbiased manner.
  • Except for raw word count and words per sentence, all variables reflect the percentage of total words.

Hot Tips:

  • LIWC offers a truncated free online version. This is a good way to try-before-you-buy. You must supply the gender and age for the participant from which the text was derived.

LIWC free version

  • LIWC allows customized dictionaries of words and phrases. We are currently working on an evidence-based dictionary to identify words in speech as markers for resiliency and self-determination.
  • Read the manual! The manual explains how to deal with abbreviations, punctuation, numerals, contractions, time stamps, slang, nonfluencies, and filler words.
  • Use a transcriber who understands the manual. If transcriptionists follow the LIWC guidelines much time and effort is saved.
  • Use the option for batch processing.
  • Combine variables. If you have a certain variable of interest, you may move LIWC output into your statistical analysis software and combine variables. One of our variables of interest was participant feelings of a positive employment outlook. We combined positive emotion, future tense, and employment (posemo+future+work). We were then able to compare those who participated in a skills training session and those who did not.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· ·

Oct/11

30

Käri Greene on NVivo

My name is Käri Greene and I’m a Senior Research Analyst at Program Design & Evaluation Services, an intergovernmental agency for the Oregon Public Health Division and Multnomah County Health Department. I first started using software with qualitative data analysis back when the main QSR product was called NUD*IST. And I’ll admit – I got a lot more attention when I said I “did research on HIV risk behaviors in NUD*IST” than I do now that the software has the less racy name of NVivo.

In applied public health, our qualitative evaluation projects tend to be relatively straightforward. We do not delve into existential questions or build elaborate theories. Instead, we have evaluation clients in public health departments who need to use our findings for policy-making and program design or improvement. But I have still found value in using Nvivo in my routine qualitative inquiries.

While some may debate the value of software in qualitative analysis, I have found benefit and utility using NVivo in my everyday evaluation practice, by increased efficiency, transparency, and responsiveness. Using NVivo has helped me manage and document my analytic process, from the initial coding process to the data visualization and report writing.

For example, the qualitative coding process can appear to be a “magical, artistic step” to clients, program staff, or even fellow evaluators. But NVivo has helped make the process more transparent and accessible by allowing me to quickly and easily call up data to respond to client questions like “did respondents on Medicaid talk about health reform differently than privately-insured respondents?” With NVivo I am able to offer an immediate response as well as the supporting data, which is invaluable for building trust and validation with evaluation clients.

Rad ResourceThe QSR NVivo online forum offers tips, suggestions, and answers from thousands of users.

Rad Resource: If you’re on the LinkedIn networking site, the NVivo Users Group connects you to users to find solutions to your questions.

Hot tip:  Use a “great quotes” code in your coding structure for finding that perfect quote later when you are writing up your reports and manuscripts.

This contribution is from the aea365 Tip-a-Day Alerts, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to aea365@eval.org. Want to learn more from Käri? She’ll be presenting as part of the Evaluation 2011 Conference Program, November 2-5 in Anaheim, California.

 

 

·

My name is Susan Keskinen. I work for Ramsey County (Minnesota) Community Human Services as a Senior Program Evaluator. My evaluation projects are related to employment and supportive services provided to Minnesota Family Investment Program (MFIP) participants, Minnesota’s version of the Temporary Assistance to Needy Families (TANF) program. I am the Communications Chair for the Minnesota Evaluation Association.

Hot Tip:

My biggest challenge with qualitative research has been the analysis of the data.  I have developed the following process that enables me to do it systematically and effectively.

  1. Read the notes from three to five interviews and determine a preliminary set of themes.
  2. Create a ‘theme’ document in Word with each theme listed on a separate page.
  3. Create a ‘working’ copy of the notes from each interview in Word and give the text of each of those ‘working’ copies a unique color and/or style of font.
  4. Read each ‘working’ copy document and paste a comment related to a theme from the ‘working’ copy to the appropriate theme in the ‘theme’ document. (By using ‘delete’ and ‘paste’ instead of ‘copy’ and ‘paste’, you can keep track of the portion of an interview that you have not been able to attribute to any existing theme.)
  5. After going through all ‘working’ copy documents once, read the portions of the notes that did not fit into any preliminary theme and determine what additional themes to add.  Add those themes to the ‘theme’ document and do Step 4 again.

The result is one Word document that lists all themes and the specific comments (in various colors and styles of font) related to each theme.  The different types of font make it easy to count the number of unique people who gave a comment related to a theme.  It is also easy to combine themes and know how it affects the number of individual comments.

If you want to analyze the data by different groups of interviewees, give all interview notes in a group the same color font but give each individual in that group a distinctly different style of font.  As shown in the example below, five different people from three different groups (red, green and blue) gave comments about Communication.

The American Evaluation Association is celebrating Minnesota Evaluation Association (MN EA) Affiliate Week with our colleagues in the MNEA AEA Affiliate. The contributions all this week to aea365 come from our MNEA members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

·

I’m Gene Shackman, an applied sociologist in Albany, NY. A project of mine is the website “Free Resources for Program Evaluation and Social Research Methods.” One tool for researchers is data and data analysis. There are many commercial programs, but for small non profits, especially in these poor economic times, the cost of these packages can be too high. Fortunately, there are many no cost alternatives, which I’ll briefly review in this Tip-A-Day.

Rad Resources: One very popular package is R. R is, “a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques.” According to one review, The Popularity of Data Analysis Software, by Robert A. Muenchen, R is increasing in popularity, and even ranks higher than many commercial packages for use in data mining. However, R is fairly difficult to learn. Recently, a number of graphical interfaces have been developed, listed here, which should, presumably, make it easier to use R. Several of these graphical interfaces are regarded as top or up-and-coming graphical interfaces here, compiled by Ajay Ohri.

Rad Resources: There are many other free to use statistical packages, listed here and here, among other places. One page shows a comparison of which statistical procedures are offered by which packages, with R offering the most, but several others offering many procedures. Most of these packages are menu-driven, and most can import data from Excel or CSV files. One main difference is how different packages treat missing data. Many of the packages, like PSPP or MicrOsiris, can handle blanks as missing, while others, like WinIDAMS, need place holders, like -9. Another difference is that some of the packages, like MicrOsiris, OpenStat and EasyReg, were developed by individuals, while others, like EpiInfo and WinIdams, were developed by organizations. The packages developed by individuals, consequently, may have limited support. A more detailed review of the various packages is in a Citizendium entry, covering these points and more.

I want to make two final points. I reviewed many of the packages and they all gave exactly the same results for at least two procedures (correlation and regression), so any of the programs can be used with a high degree of confidence about the results (see my reviews here). However, when I did try out many of the packages, MicrOsiris did the best job of reading my data set, perhaps because I had an odd data set. So while all will work well, my personal choice is MicrOsiris.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

Archives

To top