Friday, July 26, 2013

An introduction to probability theory with Elvis Costello

Last week I released a paper entitled "The Generalized $S^3$-problem. A probabilistic view on Elvis Costello's Spectacular Spinning Songbook". You can find the pdf here.
The paper is bit of a parody on statistical papers, so it shouldn't be taken too seriously. But at the same time it gives a very gentle introduction in some concepts of probability theory (Laplace, independence, the birthday paradox, ...).

Enjoy!

Wednesday, July 10, 2013

Are partygoers in Belgium using more cocaine?

Last week the Belgian newspaper De Morgen ran an article on drug use amongst Belgian partygoers. The headline of the article was "Partygoers use less cannabis and more cocaine" ("Minder cannabis, meer cocaïne bij feestvierders"). The graph that accompanied the article looked like this:


While this is dutch, the language of drugs is universal, so I'm sure you will have no difficulty in understanding what it says. There are a couple of remarks to make on this graph:

  • While there are small grey bars between the 3 groups, Alcohol/Cannabis, Xtc/Cocaine and LSD/GHB/Ketamine, initially I was fooled by thinking they were all using the same Y-axis. They're not, so you need to be careful to take scale into account.
  • Secondly, at the first glance there seems to be a drop in cannabis use, but the increase in cocaine that was mentioned in the title is less clear cut (no pun intended).
  • Thirdly, alcohol use seems to decline as well, although this is difficult to judge without significance tests
  • Fourthly, just to prove my nerdiness, the label for LSD in 2009 is missing.
  • Finally, what the hell happened in 2007? All types of drugs studied increased compared to the the 2005 study. OK, I celebrated my 40th birthday in 2007, but surely that can't explain all of this ;-) 
Before I discuss the details here, let me first say that, as far as I can judge, the journalist did an OK job in writing up the article based on the press release, so this time I'm not blaming the journalist. Secondly, the study was carried out by the VAD, a non-profit association for alcohol and other drug problems, that has a decent reputation in carrying out these types of studies. Thirdly, the VAD publishes papers on the their methodological approach (see here for a detailed methodological note on this particular study) and VAD collaborators publish in scientific journals (see for instance here).  And finally, from a methodological perspective studying drug related issues is notoriously difficult. 

Nonetheless there are some things that look strange to me. First, although I can't prove it, I don't believe the 2007 figures are correct. I stress the word believe here because without further information I can't judge it. I scanned through the reports and could not find much explanation for 2007, although the fact that the 2007 figures were odd, was acknowledged in several parts in the report (which is good). The only explanation I could find was at some point in the report where they admitted that the 2007 figures were influenced by the 'special' group of respondents they had in 2007. 

This observation, together with the scale issue mentioned above led me to make a similar chart, but using the same scale and with the year 2007 interpolated from 2005 and 2009:



In this graphical representation the variation between the years is now put in different perspective. Notice that:
  • We assume the 2007 data was indeed flawed.
  • We slightly understate the actual variation between years by interpolating the 2007 figures.
Based on a visual inspection of the new graph I would be inclined to say that cannabis use has dropped, but cocaine has remained more or less at the same level.

Of course that is just visual inspection. According to the methodological note of VAD the sample sizes were between 600 and 700. In 2012 the sample size was 618. 
I've learned from Dries Benoit that, from a Bayesian perspective you need to be careful with classical confidence intervals in cases like this (see here for a Dutch blog post on this subject),  but nonetheless, a 95% confidence interval for the proportion of cocaine use amongst partygoers in 2012 would be between 
$$0.136 - 1.96 \sqrt{{0.136\times0.864 \over 618}}=0.10897$$
and
$$0.136 + 1.96 \sqrt{{0.136\times0.864 \over 618}}=0.16303$$.
All of the previously observed proportions, except in the odd year of 2007, where in that confidence interval, so I would be more inclined to say that the last 10 years cocaine use amongst partygoers has remained at the same level, which is exactly the opposite of the title of the De Morgen article.