AEA365 | A Tip-a-Day by and for Evaluators

TAG | p-value

I’m Tamara Young and I am an associate professor in Educational Evaluation and Policy Analysis at North Carolina State University.  I teach evaluation theory and practice in education. Today, I’m going to discuss the American Statistical Association’s (ASA) Statement on p-values, which responds to the decades old highly contentious debate about null hypothesis statistical significance testing (NHSST). Implications of the debate and ASA response for the evaluation community are also described.

The Debate

The NHSST process is flawed and there is widespread “misconceptions and misuse” of NHSST. As Ronald Wasserstein and Nicole Lazar explain in their editorial on the Context, Process, and Purpose of the ASA statement on p-values, NHSST has faced serious critique for decades. In recent years, Tom Sigfried has called attention to flaws of  NHSST, describing the process as “science’s dirtiest secret”, and concluding “statistical techniques for testing hypotheses …have more flaws than Facebook’s privacy policies.” Even the journal Basic and Applied Psychology banned NHSST.

Hot Tip: The Current Resolution

In 2016 The American Statistical Association issued a statement  delineating six principles (directly quoted below) that should guide use and interpretation of p-values, which ultimately will improve practice and move us into a post “p < .05 era”:

  1. “P-values can indicate how incompatible the data are with a specified statistical model.”
  2. “P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.”
  3. “Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”
  4. “Proper inference requires full reporting and transparency.”
  5. “A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.”
  6. “By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.”

Implications for the Evaluation Community

Evaluators who utilize NHSST need to become more familiar with the debate about NHSST and read about the ASA’s six guiding principles. Instructors of quantitative methods need to discuss the debate and provide students opportunities to critically reflect upon the ASA’s principles and apply them to data analysis simulations. Additionally, the evaluation community, especially journal editors, need to encourage the use of other methods (e.g., Bayesian methods) that can function as alternatives or supplement NHSST. Lastly, funders, decision-makers, and evaluators need to consider the ASA principles when designing, interpreting, and using results.

Rad Resources:

Statistical errors: P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume.

Odds Are, It’s Wrong: Science Fails to Face the Shortcomings of Statistics

The ASA’s Statement on p-Values: Context, Process, and Purpose which includes the ASA statement, online supplemental materials related to NHSST, and alternatives to NHSST.

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

 

· ·

My name is Steve Fleming, and I work for the National Center for Educational Achievement, a department of ACT, Inc. whose mission is to help people achieve education and workplace success. I also earned an M.S. in Statistics from the University of Texas at Austin.

I have been thinking a lot lately about how to explain statistical significance. Leaving behind the problem of overemphasis on statistical significance compared to practical significance of results, my objective for this post is to provide a visual explanation of statistical significance testing and suggest a display for the statistical significance of results.

Statistical significance testing begins with a null hypothesis, which we typically want to show not to be true, and an alternative hypothesis. From sample data, a p-value is generated which summarizes the evidence against the null hypothesis. The p-value is compared to a fixed significance level, a. If the p-value is smaller than the significance level, the null hypothesis is rejected; otherwise the null hypothesis is accepted.

Hot Tip: What effect does choosing a different significance level have? In the following diagram the combined blue and red regions represent the possible sample data results if the null hypothesis is true. The blue regions show where we would accept the null hypothesis and the red regions where we would reject. It is clear that smaller levels of a make it less likely to reject the null hypothesis. In terms the language of errors, smaller levels of a offer more protection against false positives.

Hot Tip: The APA style guide suggests using asterisks next to the sample estimates to indicate the p-value when space does not allow printing the p-value itself. Using increasing intensities of color as an alternative way to indicate the most significant results saves even more space. Consider:

Rad Resource: How do you choose a consistent set of colors of increasing intensity? I have found Color Brewer to be a good source for this information.

What do you think? Does this vision clarify or obfuscate the meaning of statistical significance? I look forward to the discussion online.

 

Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.

· · · · ·

Archives

To top