My name is Guili Zhang. I am an Assistant Professor of Research and Evaluation Methodology at East Carolina University. During the last ten years, I have evaluated the National Science Foundation’s SUCCEED program, and developed and analyzed the SUCCEED longitudinal database, which includes data from nine universities and spans 20 years. Our research team’s publications based on this database have received two Best Paper Awards from the American Society of Engineering Education and the Frontiers in Education. Today I’d like to share some information about longitudinal data management and analysis.
Lessons Learned: There are two very different organizations for longitudinal data—the “person-level” format and the “person-period” format. A person-level data set, also known as the multivariate format, has as many records as there are people in the sample. As additional waves of data are collected, the file gains new variables, not new cases. A person-period data set, also known as the univariate format, has multiple records for each person—one for each person-period combination. As additional waves of data are collected, the file gains new records, but not new variables.
Besides the derived variable approach to longitudinal data analysis, which involves the reduction of the repeated measurements into a summary variable, there are two classical approaches: the ANOVA and MANOVA approaches. The ANOVA and MANOVA approaches represent well-understood methodology, and the computer software is widely available. Unfortunately, both models have limited usage in longitudinal data analysis due to their restrictive and often unrealistic assumptions and the effect of missing data on the statistical properties of their estimates. Currently, there are several alternative approaches that overcome the limitations of the traditional approaches, variously known as: mixed-effect regression model, the covariance pattern model, generalized estimating equations model, individual growth model, multilevel model, hierarchical linear model, random regression model, survival analysis, event history analysis, failure time analysis, and hazard model.
Hot Tip #1 – The person-period format most naturally supports meaningful analysis of change over time.
Hot Tip #2 – Most statistical software packages can convert a longitudinal data set from one format to another. For example, in SAS, Singer (1998, 2001) provides simple code for the conversion; in STATA, the “reshape” command can be used.
- Data Cleaning 101 (pdf)
- How can I create lag and lead variables in longitudinal data? (webpage)
- Transforming Multiple-Record Data into Single-Record Format when Number of Variables is Large (pdf)
- Longitudinal Data Techniques: Looking Across Observations (pdf)
Two introductory books that I have found useful are:
- Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence
- Applied Longitudinal Analysis (Wiley Series in Probability and Statistics)
Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to firstname.lastname@example.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.