Data Analysis Basics – Part I by Judy Savageau

Hello! I’m Judy Savageau from the Center for Health Policy and Research  at UMass Medical School. A recent post from Pei-Pei Lei, my colleague in our Office of Survey Research, introduced some options for statistical programming in R. I wondered whether a basic introduction to statistics might be in order for those contemplating ‘where do I begin’, ‘what statistics do I need to compute’, and ‘how do I choose the appropriate statistical test’. While most AEA365 blogs don’t cover every topic in detail, perhaps a basic 2-part introduction will help here. Analyses are very different with qualitative versus quantitative data; thus, I’ve concentrated on the quantitative side of statistical computations.

Hot Tip:

Analyses fall into 3 general categories: descriptive, bivariate, and multivariate; they’re typically computed in that order as we:

  • explore our data (descriptive analyses) with frequencies, percentile distributions, means, medians, and other measures of ‘central tendency’;
  • begin to look at associations between an independent variable (e.g., age, gender, level of education) and an outcome variable (e.g., knowledge, attitudes, skills; bivariate analyses); and
  • try to identify a set of factors that might be most ‘predictive’ of the outcome of interest (multivariate analyses).

Hot Tip:

The decision about what statistical test to use to describe data and its various relationships depends on the ‘nature’ of the data. Is it:

  • Categorical data:
    • nominal; e.g., gender, race, ethnicity, smoking status, participation in a program: yes/no;
    • ordinal: e.g., a Likert-type scale score of 1=Strongly disagree to 5=Strongly agree or 5 levels of education: ‘Less than high school’, ‘High school graduate/GED’, ‘Some college/Associate degree’, ‘College graduate – 4-year program’, and ‘Post-graduate (Masters or PhD degree)’;
    • interval: ordinal data in fixed/equal-sized categories; e.g., age groups in 10-year intervals or salary in $25,000 intervals; or is it:
  • Continuous data:
    • For example: age, years of education, days of school missed due to asthma exacerbations), etc.

Of course, data are often collected in one mode and then ‘collapsed’ for particular analyses (e.g., age recoded into meaningful age groups, Likert-type scales recoded as ‘agree’/’neutral’/ ’disagree’).

Hot Tip:

Decisions must take into consideration whether the data are ‘normally distributed’ (i.e., is there ‘skewness’ in the data such that the values for age are mostly in persons under 45 though you have a small number of people who are in their 60’s, 70’s, and 80’s?). Most statistical tests have a number of underlying assumptions that one must meet – all starting with data being normally distributed. Thus, one typically begins looking descriptively at their data: frequencies and percentile distributions, means, medians, and standard deviations. Sometimes, graphing the data shows the ‘devil in the detail’ with regard to how data are distributed. There are some statistics one can compute to measure the degree of skewness in the data and whether distributions are significantly different from ‘normal’. And, if the data are not normally distributed, there are several non-parametric statistics that can be computed to take this into account.

Tomorrow’s post will focus on bivariate and multivariable statistics. Stay tuned!

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

2 thoughts on “Data Analysis Basics – Part I by Judy Savageau”

  1. November 14, 2017,

    Hi Judy,

    I am an educator and M.Ed graduate studies students exploring the effectiveness of early literacy intervention programs. I live and work in Alberta Canada

    I found your post “ Data Analysis Basics- Part 1” from November 1, 2017 very relevant to some of the analytic considerations that must be explored when thinking about and analyzing data. Currently, I am in the process of analysis the literacy data–improvement in reading levels of students in grades 1-3 reading expressly below grade level. I am evaluating the effectiveness of this short term small group teacher led intensive and responsive literacy intervention program known as Levelled Literacy Intervention.

    In your post, you raise an important point about the nature of analyses and how they can vary depending on if qualitative or quantitative data is being analyzed. In my particular context examining the feasibility and sustainability of this LLI program rests on using an extensive quantitative data to illustrate and support that student reading levels have improved and that students are in fact now, due to this intervention, reading at or above reading level. Coupled with this data–a pre and post intervention BAS (Benchmark Assessment) one on one reading test to indicate reading level performance and any demonstrated gains, qualitative including self-efficacy, teacher responsiveness to learner needs, observational walkthroughs and anecdotal evidences must also be considered and equally scrutinized in order to determine if the intervention is effective and worth further subsequent years of ongoing district program funding.

    As I examine, analyze, and reflect on their collect data as well my qualitative measures as listed below, I am drawn to several ideas that you mentioned regarding underlying assumptions and how data is distributed. As you noted, looking at data descriptively in terms of percentiles and frequencies is a starting data, but as you quite rightly point out, there can be a degree of skewness in terms of how that data is graphed and ultimately presented. Defining a ‘normal’ data distribution in itself can create a degree of skewness. And further for me, looking at this intervention program through the lens of the district eyes on future sustainability, I know that the data that I collect and ultimately present to various stakeholders including superintendents and elected school trustees must also be presented in a manner that is not only clear about measurable growth and gains, but also, considers the guiding questions that each respective stakeholder must be asking myself given their stakeholder purpose and context.

    Thank you for giving me a great deal to think about with respect to ‘the devil in details’ and really causing me to stop and think about how I collapse data , both qualitative and quantitative data so it can be appropriately analyzed through multiple stakeholders lens and varying contexts.

    Thanks!
    Melissa

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.