I’m Paul Bakker, the founder and lead consultant of Social Impact Squared. I help agencies with a social purpose understand, measure, communicate, and improve their outcomes. One of the services I provide is data analysis, so I deal with statistical significance quite a lot.
Hot Tip: Abandon the 95% rule. In statistics classes, they teach you to reject the null hypothesis (of no difference) if there is less than a 5% chance that the hypothesis is true. It makes sense to provide students working on text book examples with a rule of thumb, but people don’t use such a rule when making real-life decisions. Before you analyze your data, discuss with your clients and the relevant decision makers the level of confidence they need to make a decision. Maybe they want to be 95% confident, or maybe being 80% confident is good enough for them to act.
Hot Tip: Consider the need to adjust for increases in error due to multiple tests. Often, you need to run multiple tests to analyze your data. The chance that at least one test will conclude that there is a difference when there isn’t is equal to:
For instance, if you ran 20 tests at the 95% significance level, then the chance that at least one of those tests provides you with a wrong answer is:
The typical advice is to increase the significance level of each test. However, consider the following possible scenario. Out of those 20 tests, 6 tests are significant at the 95% level, but only one is significant at the 99% level. What is more important to your client? Acting on one incorrect difference or not acting on four real differences?
Do you have questions, concerns, kudos, or content to extend this aea365 contribution? Please add them in the comments section for this post on the aea365 webpage so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to aea365@eval.org. aea365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators.
Molly Engle (2002 AEA President) wrote a follow-up post on her blog at http://blogs.oregonstate.edu/programevaluation/2013/05/16/minimum-rule/ – further thinking on from whence the rule came.
Thanks Brian and Rick.
It is true that academic journals typically require 95% confidence levels, but I see that as a disconnect between academia and community. If you are conducting applied research, then researchers (academics included) should find out the level of confidence their stakeholders need.
If you are try to confirm a universal truth or social law, then perhaps adhering to the 95% confidence level makes sense, but if you are conducting exploratory research then a lower level of confidence may be appropriate.
I think what both of you highlight, and what I completely agree with, is that the level of confidence needed is context specific. Perhaps the rule should be to: 1) consider the type/purpose of your research, 2) consider the level of confidence your stakeholders need. As Rick highlights, when only dollars are at stake, gamblers may need less confidence, but when people’s lives are at stake, you want to be pretty certain.
I tend to agree that my clients don’t really care about statistical significance levels, though I tend to focus on more evaluations for business strategy or marketing work. While I always run the calculations, at best they might make a footnote for my typical report.
I tend to focus a lot more on effect sizes, which usually do not really make it into the reports I make for clients, but often form the basis of the recommendations I make.
Also, I agree with Brian about adjusting the tests, especially when I do anything for academic publication.
It is true that academic journals typically want adjustments for multiple tests; however, there is a growing debate among statisticians and academics regarding whether such adjustments are helpful. For example see: http://beheco.oxfordjournals.org/content/15/6/1044.full
Steve, the article confirms your focus on effect size rather than statistical significance.
I take this more as a convention for different contexts.
When performing academic research, especially for a scientific journals, 5% or a tougher two-tailed test are standard. I’ve even used Bonferroni corrections for Type I error risks, which is even more tough.
However, in non-academic situations, I recently told a client that a particular correlation did not meet the standard for statistical significance in the sciences, but the graph looked like there could be a trend. Also, there are times when you fail significance tests, but there is a trend that is more complex, and requires you to dig into the data.
Perhaps 5% could be called a minimum rule in some contexts, and a rule of thumb in others.
Good points made here
In gambling circles, and on the stock market, a rule that proved correct 55% of the time would make you a lot of money, over a period of time. But if you were a heart surgeon operating with a rule that was true only 55% of the time you would be quickly fired, sued and gaoled.
The costs of failure x probability have to be weighted up along with the benefits of success x probability
Yes, I agree that ” Before you analyze your data, discuss with your clients and the relevant decision makers the level of confidence they need to make a decision”
Pingback: Abandoning the 95% rule