Fisher did not accept Neiman and Pearson’s criticism very well. In response, he called their approach “naive” and “absurd academic.” In particular, Fisher disagrees with making a decision between two hypotheses rather than calculating the “importance” of existing evidence as he proposed. Although the decision was final, his importance test only gave temporary comments and could be revised later. Even so, Fisher insists that researchers should use a 5% cutoff for “important” P value, while his claim he “will completely ignore all results that fail to reach this level.”
Acrimony will make decades of ambiguous as textbooks gradually confuse Fisher’s null hypothesis testing with Neyman and Pearson’s decision-based approach. A subtle debate on how to interpret the evidence discusses the statistical reasoning and design of the experiment, but instead becomes a fixed set of rules for students to follow.
Mainstream scientific research will rely on simple P-value thresholds and true or final decisions about hypotheses. In this world of character learning, the experimental effect either exists or does not exist. The drug either works or doesn’t work. It was not until the 1980s that major medical journals finally began to get rid of these habits.
Ironically, most of the transformation can be traced back to the ideas Neiman created in the early 1930s. As the economy struggled in the Great Depression, he noticed a growing demand for statistical insights into population life. Unfortunately, there are limited resources available to the government to study these issues. Politicians hope to achieve results within months or even weeks, and there is not enough time or money to conduct a comprehensive study. As a result, statisticians have to rely on sampling a small portion of the population. This is an opportunity to develop some new statistical ideas. Suppose we want to estimate a specific value, such as the proportion of the population with children. If we randomly sample 100 adults and none of them are parents, what does this imply for the entire country? We can’t say clearly that no one has children, because if we sampled another group of 100 adults, we might find some parents. Therefore, we need a way to measure our confidence in our estimates. This is where Neyman’s innovation is. He shows that we can calculate the “confidence interval” of a sample that tells us that we should expect the real population value to be within a certain range.
Confidence intervals can be a slippery concept because they require us to interpret tangible real-life data by imagining many other hypothetical samples. Like those Type I and Type II errors, Neyman’s confidence intervals solve an important problem in a way that often bothers students and researchers. Despite these conceptual obstacles, it is valuable to make measurements that can capture uncertainty in the study. Often especially attractive in media and politics, focus on a single average value. A single value may be more confident and precise, but ultimately it is an illusory conclusion. Therefore, in our public-oriented epidemiological analysis, my colleagues and I chose to report confidence intervals only to avoid the placement of attention on specific values.
Since the 1980s, medical journals have focused more on confidence intervals than independent real or false claims. However, habits can be hard to break. The relationship between the confidence interval and the P value does not help. Suppose our null hypothesis is that the effect of treatment is zero. If our effect estimates that the 95% confidence interval does not contain zeros, the p-value will be less than 5%, and based on Fisher’s method, we will reject the null hypothesis. As a result, medical papers are often not interested in the uncertainty interval itself, but rather in the value (or not connected) it has. Medicine may be trying to surpass Fisher, but his arbitrary 5% cutoff remains.
Excerpts adapted from Proof: Deterministic science is uncertain,,,,, Adam Kucharski. Published by the Personal Data Book in the UK on March 20, 2025.