AEA365 | A Tip-a-Day by and for Evaluators

TAG | R

Greetings AEA community, I’m Pei-Pei Lei, a biostatistician in the Office of Survey Research at the University of Massachusetts Medical School. Have you been looking to expand your skill set in statistical programming? Have you wondered if R is the appropriate statistical software package for your needs? The purpose of this post is to help you decide whether R is right for you and, if so, how you can get started using it.

R may be the right tool if you:

  • Need to manage and/or analyze quantitative data
  • Are looking for a free alternative to commercial software packages, such as SAS, SPSS, and STATA
  • Don’t mind writing computer code – does print (“Hello, world!”) look easy enough to you?
  • Want to create nice-looking and informative figures and graphics (see this website for example)

If you’re not sure, here are some places for you to get a feel for R language:

  • TryR: This website provides online interactive step-by-step practice on the webpage
  • DataCamp: This website provides online interactive step-by-step practice (more material than TryR)

Hot Tips:

The following is a list of MOOCs (Massive Open Online Courses) that can help you learn R for free (or pay a fee for a verified certificate):

  • R programming on Coursera: It’s a 4-week course to go through basic R programming knowledge. It provides a weekly quiz and a final project for you to test your skills. Good for beginning to intermediate users.
  • Introduction to R for Data Science on edX: It’s a self-paced 4-week course to go through basic R programming knowledge. This course is using DataCamp for class materials and exercises. Good for beginners.
  • R Basics – R Programming Language Introduction on Udemy: This is a self-paced course that goes through basic set up such as downloading the software and coding. Good for beginners.
  • Data Analysis with R on Udacity: This course takes about 2 months to finish (it’s also part of the Data Analyst nanodegree program). Its tutorial videos show coding processes in RStudio. Good for beginning to intermediate users.

You can also install the Swirl R package to learn R in R. It gives you interactive instructions for different topics. This is good for intermediate users.

Rad Resources:

  • R-bloggers: This is a repository of R-related articles, including tutorials. You can subscribe to the mailing list to receive the latest articles.
  • Stack overflow: This is a forum where you can post your question and get answers, or even better, provide answer to others’ questions!

Lessons Learned:

Don’t be intimidated by the many choices you have in learning R. They are the means to reach your goal. So pick one that you like and get started!

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· ·

Hi! Our names are Carrie Wiley and Matt Reeder and we are Senior Research Scientists at the Human Resources Research Organization (HumRRO). We would like to share an abbreviated version of our demonstration session presented at the 2016 annual meeting in Atlanta on how to create map data in R. It sounds like a daunting task, but it is far easier than it seems.

In addition to the many tools and resources that exist to help guide evaluators to create more effective tables and graphs, geographic mapping could also be a great benefit to identify and demonstrate geographical patterns. The use of Geographic Information System (GIS) mapping as an effective evaluation tool might be perceived by many as a rather intimidating technique, since most evaluators are not formally trained in GIS. In our work, we often deal with naturally occurring large-scale data (e.g., state-level data, school districts, counties, ZIP codes) that can be displayed in more effective ways than a traditional table. Drawing maps really just requires coordinates, and for very basic maps, R provides those coordinates in a nicely formatted file.

Hot Tips:

All you need to get started is:

GIS Basics:

In order to map data, you need to draw boundaries. Those boundary data are in shapefiles (.shp) which contain latitude and longitude coordinates of the boundaries you want to draw. The Census Bureau TIGER files (Topologically Integrated Geographic Encoding and Referencing) make various cartographic boundary shapefiles available for download, or you can use built-in R packages that essentially pull the data for you.

Mapping the Data:

Our example plots a heatmap of the number of craft breweries in each state.

  1. Retrieve the publicly available craft brewery directory: https://www.brewersassociation.org/directories/breweries/

2. Install the following R packages:

 a. library(dplyr)

 b. library(ggplot2)

 c. library(mapproj)

3. Data excerpt:

4. Load the boundary data from maps() (a ggplot() dependency):

 a. states <- map_data(“state”)

 b. Data excerpt:

5. Get counts of breweries by state and merge with the coordinates file:

6. Plot the heatmap:

So, based on this map, if you are an avid fan of craft beer, California, Washington, and Colorado are good places to check out. Of course, these are raw counts—creating a heatmap that accounts for population density would be more useful. If you are a coffee drinker, find a publicly available coffee shop database and practice your new skills plotting a heatmap of coffee shops! 

Rad Resources:

Using different combinations of R packages and Census data, you can make heatmaps by county, and school districts, and bubble charts by ZIP code.

Useful Census data:

Useful R packages

  • library(zipcode)
  • library(maps)

The American Evaluation Association is celebrating Ed Eval TIG Week with our colleagues in the PreK-12 Educational Evaluation Topical Interest Group. The contributions all this week to aea365 come from our Ed Eval TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

 

·

We are Lisa Holliday and Olivia Stevenson. We are data architects with The Evaluation Group where we have recently begun to transition to R for data analysis.  R is a free program for statistical analysis that is powerful, but can have a steep learning curve if you want to utilize its scripting capabilities.

Is R worth it for evaluators?

In short, yes! R makes advanced analyses, such as propensity score models, social network analyses, and hierarchical linear models possible with just a few lines of code. R opens possibilities with data visualizations that are highly customizable while maintaining fast reproducibility. R is an essential tool for all evaluators especially as the demand on evaluators for more rigorous designs and analyses continue to grow.

Where can I start?

The Rcmdr package offers a point and click interface that makes R much more user-friendly.  Not only is it easy to install, but it also generates R scripts, which can be saved, modified, and re-run. This can be a big time-saver!

Cool Trick 1:  The Latest Version of R

To get started, you will need to install the latest version of R, which can be found here.  The most recently released version will appear in the “News” feed.  You should also install RStudio Desktop.  RStudio is an integrated development environment (IDE) that makes working with R and installing R packages quick and easy.  When you want to use R, open RStudio to get started.

Cool Trick 2: Installing Rcmdr

Within RStudio, select “Install” from the “Packages” pane in the lower right hand corner of your screen.  In the pop-up window, enter “Rcmdr” and select “Install.”

(*click on image to see larger)

 

rcmdr

Cool Trick 3: Using Rcmdr

Once you have installed Rcmdr, all you will need to do is select it from the “Packages” pane.  While it will open automatically, the first time you use it, you may receive a message that you need to download additional packages.  If this happens, approve the installations, and Rcmdr will open in a new window when this process is complete.

(*click on image to see larger)

opening_rcmdr

Rad Resources: Rcmdr Training

There are a lot of great resources available on how to get started with Rcmdr.  Here is a brief introduction to Rcmdr that includes how to import data.  A good introduction to using RStudio (and R in general) is Lynda’s “Getting Started with the R Environment,” which you may be able to view for free through your public library.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org . aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

·

Hello! I’m Tony Fujs, Data Scientist at the World Bank. As many fellow data nerds, I love to visualize the data I have in my hands, but I also like to spend time visualizing data that I don’t have: Missing data.

Why would I do that? Because bad handling of missing data can seriously bias the results of a statistical analysis.

Hot Tip:

  • Removing missing records from a dataset is generally not the right way to handle missing data!
  • For more information about missing data, and techniques to address it, see:

Baraldi, Amanda N., and Craig K. Enders. “An introduction to modern missing data analyses.” Journal of School Psychology 48.1 (2010): 5-37.

In this post, I will present a common heatmap variation technique used to visualize missing data, and identify potentially harmful missingness patterns.

STEP 1: The data table

Here is a dataset containing fake education records. In this dataset, each row represents a unique student, and contains information about the following variables:

  • Gender
  • IQ scores
  • Reading grades
  • Math grades

fujs_image1

It’s relatively easy to spot missing data in this table, but I want to make the missing cells impossible to ignore. Let’s color each missing data cell in a bright, easy to spot color!

STEP 2: Choose a catchy color to highlight empty cells

Now, I really can’t ignore these empty cells, but I still can’t see – at least not easily – if there is any interesting pattern in the missing data.

fujs_image2

Next, I will use some heat map technique to colorize the non-missing cells of this table.

STEP 3: Color the non-empty cells according to their values

I will use a monochrome color scale. Since I am interested in comparing empty cells with non-empty cells, I want to keep the color scheme as simple as possible to facilitate this comparison. The shades of grey highlight the change in values of the non-empty cells, while maintaining a strong contrast with the red empty cells.

Low values are colored in light grey, while high values are colored in dark grey. Since we are only interested in checking missing data patterns, the actual values of the non-empty cells can be hidden.fujs_image3

STEP 4: Remove the values

Some pattern now seems to emerge. But it is stillnot as obvious as it could be…

fujs_image4

 

Let’s reorder the rows according to the values of the IQ scores columns.

STEP 4: Sort columns values

It is now hard to miss the pattern of the missing data. Reading grades are missing for the lower half of the IQ score distribution… Assuming that a correlation exists between IQ scores and reading grades, removing the empty cells from the table would overestimate the average reading grade for this group of students.

fujs_image5

 

Rad Resources:

  • This visualization can be easily produced using the R package VIM:

http://www.statistik.tuwien.ac.at/forschung/CS/CS-2008-1complete.pdf

  • It can also be done in Excel using a couple of hacks:

http://policyviz.com/create-a-heatmap-in-excel/

 

The American Evaluation Association is celebrating Data Visualization and Reporting (DVR) Week with our colleagues in the DVR Topical Interest Group. The contributions all this week to aea365 come from DVR TIG members. Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

·

Archives

To top