Binpacking: an underused algorithm in the market research industry

In these days of ChatGPT  (Chat Generative Pre-Trained Transformer) and the quest for AGI (Artificial General Intelligence), many companies invest a lot in LLM’s (Large Language Models). This also the case for Ipsos, the Market Research company I work for. That said, we also pay attention to what happens outside of the world of AI, even if it might be less fashionable at the moment. One of those good-old algorithms we recently rediscovered is the "bin packing” algorithm". 

I will admit that I had forgotten about these types of algorithms since the “linear programming and operational research” class when I studied computer science back in the early nineties. But recently we had to solve a problem in Ipsos that led to an age-old algorithm that is used a lot in logistics, chip design  and cloud computing, but, to my knowledge, not so much in the market research industry. If you are aware of other applications in market research, feel free to add them in the comments section of this blog post.


An important component of the Ipsos Global Influentials (IGI)  study is tracking the media usage of top 20% of income and business leaders in 40+ markets. To do that, the study presents logo's of mediabrands in an online survey.  There's an advantage in having mediabrands that are related to each other on the same screen. Since screen real estate is limited, we need to fit as many mediabrands as possible on one screen without exceeding six logos per screen. For example mediabrands that can easily be confused should be on the same screen. Mediabrands that belong to the same family of brands should be presented together, and finally, mediabrands that belong to the same category (f.e. sports) should be presented together as well. Moreover, as is always the case in online survey research, the number of  screens needs to be minimized.  

While sorting mediabrands based on parentbrand and category can get you a long way, combining incomplete screens such that you minimize the number of screens overall is timeconsuming to do manually. In the IGI study this needs to be done for more than 40 countries, each with dozens of mediabrands. That's a lot of work, so my team was asked to find an algorithm that could help with the process.  

It turns out that this problem can be casted as a bin packing problem. We start by giving a description of the bin packing problem taken from wikipedia:

The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, and technology mapping in FPGA semiconductor chip design.

There are many variants to the bin packing problem such as packing by weight, packing by cost, 2-dimensional packing. There’s also a link with two other famous problems:

  1. The cutting stock problem: As the name suggests, this problem stems from the paper industry and can be described as determining how to cut standard rolls of paper of a certain width into smaller widths to fulfill orders in such a way that it minimises the amount of scrap.
  2. The knapsack problem: In the knapsack problem you need to pack a set of items, with given values and sizes (f.e. weights or volumes) in a knapsack with a maximum capacity. If the total size of the items exceeds the capacity of the knapsack, you obviously can’t pack them all. The problem is then to choose a subset of the items of maximum total value that will fit in the knapsack.

In the context of IGI we actually use a hierarchical bin packing approach in that items are organized in a hierarchical or nested structure. The goal of hierarchical bin packing is to pack the items into the bins in a way that respects the hierarchical structure and minimizes the total cost or number of bins used. F.e. mediabrands are nested into parentbrands, mediabrands are nested into categories., and so on.


In the future we will investigate whether we can push the current solution even further. Two possible variants we might want to investigate are: 
  1. Online Bin Packing with Repacking: In general, online bin packing means that items arrive one by one in a sequence. In our context that would mean that someone has updated, for example added,one mediabrand. You then need to assign a screen to that brand with possible ripple effects on other screens. Additionally, in Online Bin Packing with Repacking, some limited repacking of items already placed in bins is allowed. From an IGI perspective you would want to allow some repacking to make sure you have balanced screens, but you would want to limit the number of repacks because that would violate the constraint of having a low number of changes between waves.
  2. Bin packing by costs: You could introduce a cost factor to express the fact that we prefer not to pack mediabrands from the News category with those from the Fashion category. Bin packing by costs could then be used to implement that strategy.
The binpacking problem is an optimization problem that can be resolved with many different methods. To start with there are exact algorithms that come from the linear programming and operational research world, such as Branch and Bound, Integer Programming, and Dynamic Programming.
A second class of methods are heuristic algorithms. Generally, these methods are not guaranteed to be
optimal, but they are easier and faster. In many practical situation they yield good enough results. Here's a list of the most common approaches:
  • Next Fit (NF): Loop through the list of items in order. Check to see if the current item can fit into the current bin. If so, assign it to the current bin and update the remaining free capacity of the current bin. Otherwise, record the final free capacity of the current bin, start a new bin, assign the current item to the new bin, and update the free capacity of the new bin.
  • First Fit (FF): Scan the list of bins in order to find the first bin that is large enough to hold the current item. Loop through each item in the list in order. Place the new item in the first bin that is large enough to hold it. A new bin is created only when the current item does not fit in any of the bins.
  • First Fit Decreasing (FFD): A problem with the First Fit algorithm is that packing large items is difficult, especially if they occur late in the sequence. The solution is to first sort the items by size, from largest to smallest, then run the First Fit algorithm.
  • Best Fit: Scan the list of bins in order to find a bin where the item fits the tightest. Loop through each item in the list in order. Place the new item in a bin where it fits tightest. It it does not fit in any bin, then start a new bin.
  • Best Fit Decreasing (BFD):A problem with the Best Fit algorithms is that packing large items is difficult, especially if they occur late in the sequence. The solution is to first sort the items by size, from largest to smallest, then run the First Fit algorithm.
It's very well possible that the binpacking algorithm is not of great use to you. That's fine. The message I try to convey is not so much that this algorithm is great, but rather that you need to look for solutions that are best suited for solving the problems your business has. Sometimes that will be AI, but very often it will not be of much use to you. Unfortunately many business leaders seem to set their hopes exclusively on AI and are missing out on better and cheaper solutions. They are like rabbits blinded by the light. In business this is seldom a good strategy. 

Comments

Popular posts from this blog

(small) samples versus alternative (big) data sources

Market Research and Big Data: A difficult relationship

Why The Nielsen Company is an #mrx Big Data company avant la lettre