TAG | data analysis
My name is Lisa R. Holliday, and I am a Data Architect with The Evaluation Group. Cleaning data is a large part of my job. For studies that need to match participants’ names or IDs, it is critical to ensure that information is entered consistently from one data collection cycle to the next. However, depending on the data collection tools available, it’s not always possible to do this. Cleaning and matching data can be time-consuming.
I’ve found two useful tools that have helped to reduce the amount of time I spend working with these data and increased matching accuracy.
Rad Resource 1: FuzzyMatch Macro for Excel
FuzzyMatch is a free Excel macro available from the Mr.Excel message board. It creates three functions:
FuzzyVlookup and FuzzyHlookup look up values against a master list and suggest possible matches. FuzzyPercent allows you to assess the similarities between the proposed matches and the master list. In my testing, FuzzyMatch was able to correctly match values that differed by as many as nineteen characters in length.
One drawback to this macro is that it can be time-consuming to run. During testing, it took Excel approximately one minute to process 942 rows using FuzzyVlookup and FuzzyPercent concurrently. Additionally, the functions have volatile characteristics (they sometimes recalculate when you save the workbook). To avoid this, after running the functions, copy and paste the results as text.
This video demonstrates how to use FuzzyMatch.
Rad Resource 2: iugum Data Matching Software
This software performs much like the Excel macro, except it is faster and more user-friendly. There is a 14-day free trial available, and subscriptions run from $99 to $400. During testing, it matched over 8,000 rows of data in less than one minute. If unable to perform an exact match, iugum provides possible matches, and it is easy to accept or reject suggested matches. If you regularly clean data using a master list, this is a valuable tool.
This video provides an overview of iugum Data Matching Software.
We’re celebrating 2-for-1 Week here at aea365. With tremendous interest in the blog lately, we’ve had many authors eager to share their evaluation wisdom, so for one special week, readers will be treated to two blog posts per day! Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to firstname.lastname@example.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.
Greetings, I am June Gothberg, Lead Curator for aea365 and Research Associate at Western Michigan University. As Lead Curator, I am always looking for ways to expand the knowledge of evaluators through hot tips, cool tricks, lessons learned, and rad resources. While working on my dissertation, I made a great find and thought I would share it with you. I was looking for a way to measure my variables measuring conversations between participants. In my search of the literature, I ran across the Linguistic Inquiry and Word Count (LIWC) software and discovered it’s multitude of uses.
LIWC is a computerized text analysis program with Mac and Window versions. It calculates the degree to which people use different categories of words across texts, including emails, speeches, poems, or transcribed daily speech. A few of the most interesting include positive or negative emotions, self-references, causal words, as well as 70 other language dimensions. A new area in which LIWC is being used is social network analysis.
- Don’t be afraid to go outside your field. For example, the roots of modern text analysis are found in the field of psychology.
- In general, LIWC categorizes words hierarchically. For example, insight is a subgroup of cognitive processes and anger is a subgroup of negative emotions. So, you must decide what level to measure.
- LIWC offers a nice triangulation for analyzing data. It helped validate the rater/coder findings of my study in an unbiased manner.
- Except for raw word count and words per sentence, all variables reflect the percentage of total words.
- LIWC offers a truncated free online version. This is a good way to try-before-you-buy. You must supply the gender and age for the participant from which the text was derived.
- LIWClite7 is an affordable student version with a few less options.
- LIWC2007 allows customized dictionaries of words and phrases. We are currently working on an evidence-based dictionary to identify words in speech as markers for resiliency and self-determination.
- Read the manual! The manual explains how to deal with abbreviations, punctuation, numerals, contractions, time stamps, slang, nonfluencies, and filler words.
- Use a transcriber who understands the manual. If transcriptionists follow the LIWC guidelines much time and effort is saved.
- Use the option for batch processing.
- Combine variables. If you have a certain variable of interest, you may move LIWC output into your statistical analysis software and combine variables. One of our variables of interest was participant feelings of a positive employment outlook. We combined positive emotion, future tense, and employment (posemo+future+work). We were then able to compare those who participated in a skills training session and those who did not.
Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to email@example.com . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.
My name is Käri Greene and I’m a Senior Research Analyst at Program Design & Evaluation Services, an intergovernmental agency for the Oregon Public Health Division and Multnomah County Health Department. I first started using software with qualitative data analysis back when the main QSR product was called NUD*IST. And I’ll admit – I got a lot more attention when I said I “did research on HIV risk behaviors in NUD*IST” than I do now that the software has the less racy name of NVivo.
In applied public health, our qualitative evaluation projects tend to be relatively straightforward. We do not delve into existential questions or build elaborate theories. Instead, we have evaluation clients in public health departments who need to use our findings for policy-making and program design or improvement. But I have still found value in using Nvivo in my routine qualitative inquiries.
While some may debate the value of software in qualitative analysis, I have found benefit and utility using NVivo in my everyday evaluation practice, by increased efficiency, transparency, and responsiveness. Using NVivo has helped me manage and document my analytic process, from the initial coding process to the data visualization and report writing.
For example, the qualitative coding process can appear to be a “magical, artistic step” to clients, program staff, or even fellow evaluators. But NVivo has helped make the process more transparent and accessible by allowing me to quickly and easily call up data to respond to client questions like “did respondents on Medicaid talk about health reform differently than privately-insured respondents?” With NVivo I am able to offer an immediate response as well as the supporting data, which is invaluable for building trust and validation with evaluation clients.
Rad Resource: The QSR NVivo online forum offers tips, suggestions, and answers from thousands of users.
Rad Resource: If you’re on the LinkedIn networking site, the NVivo Users Group connects you to users to find solutions to your questions.
Hot tip: Use a “great quotes” code in your coding structure for finding that perfect quote later when you are writing up your reports and manuscripts.
This contribution is from the aea365 Tip-a-Day Alerts, by and for evaluators, from the American Evaluation Association. Please consider contributing – send a note of interest to firstname.lastname@example.org. Want to learn more from Käri? She’ll be presenting as part of the Evaluation 2011 Conference Program, November 2-5 in Anaheim, California.
My name is Susan Keskinen. I work for Ramsey County (Minnesota) Community Human Services as a Senior Program Evaluator. My evaluation projects are related to employment and supportive services provided to Minnesota Family Investment Program (MFIP) participants, Minnesota’s version of the Temporary Assistance to Needy Families (TANF) program. I am the Communications Chair for the Minnesota Evaluation Association.
My biggest challenge with qualitative research has been the analysis of the data. I have developed the following process that enables me to do it systematically and effectively.
- Read the notes from three to five interviews and determine a preliminary set of themes.
- Create a ‘theme’ document in Word with each theme listed on a separate page.
- Create a ‘working’ copy of the notes from each interview in Word and give the text of each of those ‘working’ copies a unique color and/or style of font.
- Read each ‘working’ copy document and paste a comment related to a theme from the ‘working’ copy to the appropriate theme in the ‘theme’ document. (By using ‘delete’ and ‘paste’ instead of ‘copy’ and ‘paste’, you can keep track of the portion of an interview that you have not been able to attribute to any existing theme.)
- After going through all ‘working’ copy documents once, read the portions of the notes that did not fit into any preliminary theme and determine what additional themes to add. Add those themes to the ‘theme’ document and do Step 4 again.
The result is one Word document that lists all themes and the specific comments (in various colors and styles of font) related to each theme. The different types of font make it easy to count the number of unique people who gave a comment related to a theme. It is also easy to combine themes and know how it affects the number of individual comments.
If you want to analyze the data by different groups of interviewees, give all interview notes in a group the same color font but give each individual in that group a distinctly different style of font. As shown in the example below, five different people from three different groups (red, green and blue) gave comments about Communication.
The American Evaluation Association is celebrating Minnesota Evaluation Association (MN EA) Affiliate Week with our colleagues in the MNEA AEA Affiliate. The contributions all this week to aea365 come from our MNEA members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to email@example.com. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.
I’m Gene Shackman, an applied sociologist in Albany, NY. A project of mine is the website “Free Resources for Program Evaluation and Social Research Methods.” One tool for researchers is data and data analysis. There are many commercial programs, but for small non profits, especially in these poor economic times, the cost of these packages can be too high. Fortunately, there are many no cost alternatives, which I’ll briefly review in this Tip-A-Day.
Rad Resources: One very popular package is R. R is, “a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques.” According to one review, The Popularity of Data Analysis Software, by Robert A. Muenchen, R is increasing in popularity, and even ranks higher than many commercial packages for use in data mining. However, R is fairly difficult to learn. Recently, a number of graphical interfaces have been developed, listed here, which should, presumably, make it easier to use R. Several of these graphical interfaces are regarded as top or up-and-coming graphical interfaces here, compiled by Ajay Ohri.
Rad Resources: There are many other free to use statistical packages, listed here and here, among other places. One page shows a comparison of which statistical procedures are offered by which packages, with R offering the most, but several others offering many procedures. Most of these packages are menu-driven, and most can import data from Excel or CSV files. One main difference is how different packages treat missing data. Many of the packages, like PSPP or MicrOsiris, can handle blanks as missing, while others, like WinIDAMS, need place holders, like -9. Another difference is that some of the packages, like MicrOsiris, OpenStat and EasyReg, were developed by individuals, while others, like EpiInfo and WinIdams, were developed by organizations. The packages developed by individuals, consequently, may have limited support. A more detailed review of the various packages is in a Citizendium entry, covering these points and more.
I want to make two final points. I reviewed many of the packages and they all gave exactly the same results for at least two procedures (correlation and regression), so any of the programs can be used with a high degree of confidence about the results (see my reviews here). However, when I did try out many of the packages, MicrOsiris did the best job of reading my data set, perhaps because I had an odd data set. So while all will work well, my personal choice is MicrOsiris.
Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to firstname.lastname@example.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.