All Things Data Science

Posts

Showing posts from October, 2013

Over Lampedusa, asielaanvragen, Europa en een grafiek in De Standaard

October 26, 2013

Op vrijdag 25 Oktober 2013 verscheen er in "De Standaard" een artikel onder de kop "Vluchtelingen moeten het doen met beloftes". Het artikel zelf is prima, het handelt over het probleem van de vluchtelingen in Europa, dat omwille van de ramp voor Lampedusa, hoog op de Europse agenda is geraakt. De grafiek bij het artikel, echter, is niet onmiddellijk een schot in de roos te noemen. Het probleem bij deze grafiek is dat men de oppervlakte van cirkels gebruikt om verhoudingen te vergelijken, en dat is bijzonder moeilijk. Neem bijvoorbeeld het Verenigd Koninkrijk. Ongeveer de helft (14600) van de 28200 asielaanvragen wordt goedgekeurd. De oppervlakte van de rode cirkel is dan ook ongeveer de helft van de blauwe cirkel. Ik heb het eens nagerekend, en het klopt vrij aardig, maar de modale lezer zal allicht niet onmiddellijk aan die verhouding denken. Maar bon, de getallen zelf staan er netjes bij, dus ook al werkt het visueel niet goed, dan heb je toch nog de getallen ...

Managing Data Scientists

October 23, 2013

With the rise of the 'Data Scientist', a lot has been said about the definition, role, qualifications and skills of the Data Scientist, and how to hire them. A somewhat neglected topic is how to manage data scientists. Indeed, data scientists, by their very nature, are hard to manage. They love to resolve problems, but those problems are not always the business problems you want them to tackle. They are ace players, but they're not always the best team players and some of them can sometimes have difficulty in dealing with (higher) management. They can have bright ideas, but they often lose interest when it comes to implementing those ideas in a profit making activity. They will find clever solutions for you, but they don't always excel in making sure that a structured process is place, let alone the administrative follow up that comes with it. Some of them were hired as 'rock-stars' and have developed an ego that goes with that... On the other hand, they are...

A small experiment with Twitter's language detection algorithm

October 17, 2013

Some time a go I captured quite a lot of geo-located tweets for a spatial statistics project I'm doing. The tweets I collected were all confined to be in Belgium. One of the things I looked at was the language of tweets. As you might know, Belgium officially has three languages, Dutch, French and German. Of course, when you analyze a large set of tweets, you can't manually determine the language, on the other hand blindly relying on Twitter's language detection algorithm doesn't feel good either. That's why I set up a little experiment to assess to what extent Twitter's language detection algorithm can be trusted, in the context of my geo-location project. I stress this because I don't have the ambition to make overall judgments on how Twitter takes care of language detection. First, let's look at the languages as determined by the Twitter language detection algorithm of the 150,000 or so tweets I collected. The barchart below shows the frequency of...