“This thing called ‘price’ is really, really important. I still think that a lot of people under-think it through. You have a lot of companies that start and the only difference between the ones that succeed and fail is that one figured out how to make money, because they were deep in thinking through the revenue, price, and business model. I think that’s under-attended to, generally.” – Steve Ballmer, Former CEO, Microsoft.
The business of monetizing data has grown significantly over the past decade and has created unprecedented opportunities for firms to market their products, advance their predictive abilities, and target customers with surgical precision. As proclaimed in The Economist, data is now the most valuable resource in the world. The growth in the value of data is primarily fueled by the intelligence that data-driven analytics provides in making critical business decisions. While there are a ton of articles in the popular press about the perks of monetizing data, little is known about how data should be actually priced. Motivated by the practical and theoretical significance of data monetization, we (my advisors and I) have developed a framework that is appropriate for the purchasing of data by a buyer and the corresponding pricing by the seller. In this article, I will briefly explain this framework and highlight the key findings of our research paper.
An initial challenge that any firm in the business of data monetization faces is to build an efficient data warehouse to integrate data from disparate sources. Even when a firm manages to put together a high-quality, uninferred, and well-structured dataset that complies with user-privacy—which can be a daunting task in its own right—appropriately pricing that dataset is crucial to the success of monetization. A dataset is essentially an information good. Its value is derived from the information that it contains. While economists have studied how to price information goods in general, the selling of a dataset, as I will explain below, is more nuanced than that of information goods like telephone minutes and internet bandwidth.
Consider the illustrative dataset on the mobile activity of smartphone users in North America shown in Table 1. Such a dataset is highly valuable for marketers as it helps them execute targeted campaigns. Moreover, the valuations for different subsets of records by different marketers vary; that is, the records of interest and the corresponding value for them differ across buyers. A natural question then arises: How should the data-seller price such a dataset? Should he specify a price for each potential set of records that a buyer can select? Clearly, it would be impractical to even specify such a set-based pricing policy due to its exponential size. Should he make a simple take-it-or-leave-it (A pricing policy in which the buyers are required to either purchase all the records in the dataset at a given price or buy nothing.) offer to the buyers? This, of course, would turn away a lot of buyers who are interested in purchasing only a few records (of their choice) from the dataset. To answer these questions, we develop a mathematical model with the following features:
Table 1: An illustrative dataset consisting of information on mobile activity by smartphone users.
For each individual buyer, her ideal record and decay rate are her private information, which she uses along with the seller’s pricing policy to purchase the records of her choice. Anticipating the buyers’ decisions, the seller’s goal is to design a pricing policy that maximizes his expected revenue. The multi-dimensional private information of the buyers coupled with their endogenous selection of records makes the seller’s problem of optimally pricing the dataset a challenging one. I will now highlight the key results from our analysis.
This research focused on a monopolistic data-seller who is interested in monetizing a dataset and is a first step towards understanding how datasets should be priced. The ongoing explosive growth in the supply and demand of data has led to the emergence of data-selling platforms, such as Narrative, that cater to both sides of the market: sellers list their datasets on the platform and buyers purchase data from one or more sellers. Such two-sided settings become richer due to several interesting constraints:
Further, from a market-design perspective, the analysis of two-sided data-selling platforms presents interesting theoretical and practical challenges. For instance, the utility of a dataset to a buyer may depend on the number of other buyers who have shared access to that dataset. I believe that our current findings can serve as a foundation for future work on optimal and/or approximate mechanisms for such settings and, more generally, on the design of efficient data-selling platforms.