Eerie similarity: Stockmarket predicts Germany-Greece soccer game result!

Forget Paul the Octopus, Chanakya the Fish, and all other football predicting animals. It appears that the stock market can be used to predict the outcome of  football matches!  Now that the stock markets are not doing so well, at least we can use them to make a few bucks on football betting sites.
Take last weekend's game, for instance, when Germany played Greece. It turns out that the score evolution of this game followed a very similar pattern than the German Greek spread from half March to half June 2012. The pattern is obvious in the graph below. The correlation between the two series is an impressive 93%.
The attentive reader will notice that there is a gap between the 45 and 60 minutes marks. Indeed the stock markets predicted that the third and fourth German goals would have been scored earlier. Specialists are investigating whether this has to do with the break after the first half . But, other than that small gap, the fit between the two lines is very close.
These results don't come as a surprise. Earlier it was reported that Twitter can be used to predict the stock market (see Twitter Can Predict the Stock Market in Wired). And now it turns out that, on its turn, the stock market can be used to predict the results of football games. Whether Twitter itself can predict the football results directly remains an open question.


   





And now what really happened:

A couple of months a go I read an interesting article entitled "I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper" by Daniel Gayo-Avello in which he explained that, despite some claims in the popular press, the predictive power of Twitter is far from proven. The following quote is from that paper, and to me it summarizes well what the problem with a lot of these claims is:
"It’s not prediction at all! I have not found a single paper predicting a future result. All of them claim that a prediction could have been made; i.e. they are post-hoc analysis and, needless to say, negative results are rare to find."
Some other interesting reading in this area is "The junk science behind the ‘Twitter Hedge Fund’", and "Sour Grapes: Seven Reasons Why That Twitter Prediction Model is Cooked." by Ben Gimpert. One of the recurring themes is that by being selective in what data you will show you can easily relate many events that logically should be independent.


This weekend I was watching the European Football Championships. More specifically I was watching Germany playing against Greece. For well know political and economic reasons this game was getting more attention than usual. Also, on Twitter, you could enjoy a whole series of football and economy related jokes.
Thinking back to the aforementioned papers I wanted to verify whether indeed it is easy to misrepresent actual data so that they falsely suggest a relationship and took the Germany-Greece game as a case in point. It took me less than half an hour to find data that matched relatively well with some events during the game. It then took me just a little bit longer to represent it graphically in such a way that it looks more convincing.

Of course, as such, this little experiment proves nothing, but it nicely illustrates how easy it is to find data that fits any given hypothesis in a post-hoc analysis. BTW, I'm not suggesting that the authors of the Twitter Prediction model did that, I'm just pointing out how easy it is.

I conclude with some tips & tricks for those who want to try this at home. It also helps you to recognize others who are trying to fool you:
  • In a lot of cases you can easily describe your data as some kind of pattern that is to find in other sources. In this case I took the sequence of goals that were scored and considered the difference between the two. This yields 0,1,2,1,2,3,1, a pattern that you can easily detect in other data sources. 
  • A great source of selectivity is playing with the start and end dates of events. Also you can be creative with the precision at which something is measured.
  • Use statistics only when they sound spectacular: in this case the correlation is 0.93 sounds pretty convincing.
  • Express correlations and proportions in percentages to make it obvious to they layman that the maximum is 100.
  • Suggest higher precision than you actually have. Here our axis with the goal difference gives decimals while goal difference can only be integers.
  • Be gracious and admit the relationship you are describing is not perfect, this will increase your credibility. In our example the discussion about the third and fourth German goal serves that function.
  • Use dotted lines.
  • If needed use thicker lines.

Comments

Popular posts from this blog

(small) samples versus alternative (big) data sources

Market Research and Big Data: A difficult relationship

Why The Nielsen Company is an #mrx Big Data company avant la lettre