A note on observing zero successes
Say that you have a sample of size
and
using
So far, so good.
Now say you have observed zero successes, i.e.
Say that someone wants to know what proportion of human beings can fly and say that you observe zero successes in a sample of
In business statistics we don't have the time and resources to check all individuals of the population, furthermore, the complete absence of something is seldom of great concern from a business perspective.
Nonetheless, sometimes, even in business, the absence question can be important. Think about the presence or absence of errors, for example. Clearly, in a lot of cases you can't verify all units whether there's an error or not. You then often sample cases. If it is of business interest to know whether the whole population of interest has no errors it's clear you will have to work with confidence intervals, credible intervals or some other way of expressing uncertainty.
What can you do?
As explained by John D. Cook in his blogpost Estimating the chances of something that hasn’t happened yet (See https://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/), you could argue that we want to find the largest proportion
or
The 0.05 is coming from the fact that we want to have a 95% confidence interval.
We want to resolve this last equation for
and we work out further to:
We then apply the Taylor series expansion for
For small values of
which leads to the rule of thumb that:
This rule of thumb is attributed to Ronald A. Fisher who described it in his book Statistical Methods for Research Worker published in 1925.
Notice that from a frequentist perspective, more specifically how confidence intervals should be interpreted, this rule of thumb is not without problems. Why this is the case would lead us too far, but there's a Bayesian argument to be made that leads to the same rule of thumb.
In the blogpost I mentioned earlier John D. Cook gives a Bayesian derivation:
Suppose you start with a uniform prior on p. The posterior distribution on p after having seen 0 successes and N failures has a beta(1, N+1) distribution. If you calculate the posterior probability of p being less than 3/N you get an expression that approaches 1 – exp(-3) as N gets large, and 1 – exp(-3)
This derivation really relies on your knowledge of how to analytically come to an expression for the posterior distribution in this particular case. If you don't have that knowledge I present a more intuitive illustration here.
We start again from a uniform prior distribution. The likelihood is simply the binomial. We don't know which
Below you can find example code that illustrates how simple this idea can be implemented in R:
n<-100
p_grid<-seq(from=0, to=1, length.out=10000)
prior<-rep(1,10000)
likelihood<-dbinom(0, size=n, prob=p_grid)
unstd.posterior<-likelihood*prior
posterior<-unstd.posterior/sum(unstd.posterior)
samples<-sample(p_grid, prob=posterior, size=1e5, replace=TRUE)
q95<-as.numeric(quantile(samples, 0.95))
print(q95)
The result is close to Fisher's rule of thumb
Next to the Bayesian approach with grid-approximation (abbreviated as Bayes), and Fisher's rule of three (abbreviated as Fisher), we also work out the binomial case (abbreviated as Binomial).
To do this we go back a few equations a go , and work it out further:
and exponentiate both sides:
which finally leads to:
I did this for
Comments
Post a Comment