All Things Data Science

Posts

Showing posts from September, 2023

A Frequentist and a Bayesian go to a bar ...

September 24, 2023

(Note: you might want to refresh this page on your browser if the equations don't render correctly.) In the first installment of this blogpost , I illustrated that Fisher's rule of thumb of using $\frac{3}{n}$ for the upper limit of a 95% confidence/credible interval is a good approximation as soon a $n>=25$. This was inspired by a blogpost from John D. Cook on the subject. At the end I made a remark about something odd that happens when $n=1$. Fisher's rule of thumb results in 1, which is not very informative. The Bionomial solution is 0.95. When $n=1$ this is now an actual Bernoulli, i.e. a special case of the binomial if you will: $$P(S_1=0)= {1\choose 0}p^0(1-p)^1=0.05$$ $$= 1(1-p)=0.05$$ $$p=1-0.05=0.95.$$ Yet, in the Bayesian analysis, the result is p=0.78. Why? First let's recalculate that number in an even simpler manual way than I showed in the first installment of this blogpost. We know that the distribution we're interested in is the Bernoulli distri...

A note on observing zero successes

September 19, 2023

Say that you have a sample of size $n=1000$ and you observed $S_n=100$ successes. Traditionally you would use $\hat p=\frac{S_n}{n}=\frac{100}{1000}=0.1$ as a point estimate of the population proportion $p$. From a frequentist perspective you would probably also report a confidence interval: $$p_-=\hat p - z_\alpha\sqrt{\frac{\hat p(1-\hat p)}{n}}=0.1-1.96\sqrt{\frac{0.1 \times 0.9}{1000}}=0.08140581,$$ and $$p_+=\hat p + z_\alpha\sqrt{\frac{\hat p(1-\hat p)}{n}}=0.1-1.96\sqrt{\frac{0.1 \times 0.9}{1000}}=0.1185942,$$ using $z_\alpha=1.96$ for a 95% confidence interval (Assuming that the sample fraction is small, i.e. the universe size $N$ is large relative to $n$. Also, I will not go into how such a confidence interval needs to be interpreted.). So far, so good. Now say you have observed zero successes, i.e. $S_n=0$, and you want to apply the procedure above. To start with, you can't because it violates the non-zero sample proportion assumption. There are some alterna...