Tuesday, December 18, 2012

Why The Nielsen Company is an #mrx Big Data company avant la lettre

Recently there was an interesting discussion on the Next Gen Market Research (NGMR) Linkedin group. The original discussion was started by Emma Kingham, Marketing Manager at Warc. She asked to "Make a prediction about the future of market research that you believe is GUARANTEED to happen". Pretty soon there was a subthread on market research and Big Data.

Most of us agreed that the analytical capabilities you will find in the market research world can be valuable for Big Data, especially given the strong IT-leaning it seems to have. What we didn't agree on was, suprise suprise, the definition of Big Data itself.  Basically there were two schools of thought. One group saw Big Data as a relative concept: what is small for one community might be big for another. This school basically saw no difference between Big Data and Normal Data: It's just more of it. The other group, that included me, stressed the technical challenges (Hadoop, Mahout, and so on). In my view a lot of the so called big data done by traditional market research firms is not big data by those standards. 

Independent of that  Stijn Viaene (@destivia) pointed to an article in Forbes by Quentin Gallivan (Pentaho Corp)  on retailers and Big Data (Why Big Data Is All Retailers Want for Christmas). I was struck by the fact that The Nielsen Company was not mentioned at all, while traditionally they are an important information provider for Retailers and manufacturers. They're also the largest market research company (source: The Honomichl Top 50 Report).

As an ex-Nielsen guy, I was surprised by this, because Nielsen is one of the few market research companies that actually are dealing with big data for a long time, even before the name big data was coined. OK, they might not use Hadoop (yet), but they have some features that fit the bill quite well. I can't give all the details here because I don't want their lawyers chasing me. But without giving away their trade secrets I can give a few reasons why  you could consider The Nielsen Company as a Big Data company:
  • The Nielsen Company has over half a petabyte of data on Netezza in the US, and 45 terabytes of data on Oracle in its European Customer Information Factory (2009 figures). The Buy segment processes approximately nine trillion purchasing data points each month. I'm pretty sure these numbers are much higher today. While they're mainly using traditional database systems to process the data, they have a tradition of using different strategies to improve performance, for instance in the phase where the store oriented retailer data is aggregated in market and product oriented databases. 
  • Nielsen has developed "online" algorithms to avoid batch processing. Online algorithms process input in a serial fashion without the need of having the entire input available. A very simple example is the "online" version of the mean: $$\bar x_n = \bar x_{n-1} + \frac{x_n - \bar x_{n-1}}{n}.$$I will not disclose here what they're doing with it exactly, but it is pretty clever.
  • Another technique they use to improve speed is "memoization" (i.e.  an optimization technique that uses function calls to avoid repeating the calculation of results for earlier processed inputs.) 
  • They use machine learning algorithms to automatically classify products in meaningful buckets before they are processed by humans. 
  • The company does not only process traditional forms of data, but they have years of experience in processing video and images (commercials for the Watch division and pictures of products for the Buy division).
  • Somewhere around 2005, the European branch of the Watch division of Nielsen toyed around with Tuple Spaces. According to Wikipedia, a tuple space is "an implementation of the associative memory paradigm for parallel/distributed computing. It provides a repository of tuples that can be accessed concurrently. As an illustrative example, consider that there are a group of processors that produce pieces of data and a group of processors that use the data. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern. This is also known as the blackboard metaphor. Tuple space may be thought as a form of distributed shared memory". Again, a Tuple Space is not Hadoop but nonetheless it is worth noticing that the company was experimenting with distributed processing quite early in the game. BTW, I can report about these experiments because the  project got canceled in an early stage and never made it in production.
  • The statistics departments in Nielsen are called "Measurement Science", pretty close to the now popular term "Data Science" that is often associated with Big Data.

Based on that I think that The Nielsen Company deserves the title of a #mrx Big Data company Avant La Lettre.

No comments:

Post a Comment