Skip to main content

Nonparametric Pricing Bandits Leveraging Informational Externalities to Learn the Demand Curve

Marketing Science
Articles
Published: Forthcoming
Author(s): I. Weaver V. Kumar, and L. Jain

Abstract

We propose a novel, theory-based approach to the reinforcement learning problem of maximizing profits when faced with an unknown demand curve. Our method, rooted in multi-armed bandits, balances exploration and exploitation across various prices (arms) to maximize rewards. Traditional Gaussian process bandits cap- ture one informational externality in price experimentation – correlation of rewards through an underlying demand curve. We extend this framework by incorporating a second externality, monotonicity, into Gaussian process bandits by introducing monotonic versions of both the GP-UCB and GP-TS algorithms. Through reduction of the demand space, this informational externality limits exploration and experimentation, out- performing benchmarks by enhancing profitability. Moreover, our approach can also complement methods such as partial identification. Additionally, we present algorithm variants that account for heteroscedastic noise in purchase data. We provide theoretical guarantees for our algorithm, and empirically demonstrate its improved performance across a broad range of willingness-to-pay distributions (including discontinuous, time-varying, and real-world) and price sets. Notably, our algorithm increased profits, especially for distribu- tions where the optimal price lies near the lower end of the considered price set. Across simulation settings, our algorithm consistently achieved over 95% of the optimal profits.

Journal:
Marketing Science