Timezone: »
We consider a problem where customers repeatedly interact with a platform. During each interaction with the platform, the customer is shown an assortment of items and selects among these items according to a Multinomial Logit choice model. The probability that a customer interacts with the platform in the next period depends on the customer's cumulative number of past purchases. The goal of the platform is to maximize the total revenue obtained from each customer over a finite time horizon. We first study a non-learning version of the problem where consumer preferences are completely known. We formulate the problem as a dynamic program and prove structural properties of the optimal policy. Next, we provide a formulation in a contextual episodic reinforcement learning setting, where the parameters governing consumer preferences and return probabilities are unknown and learned over multiple episodes. We develop an algorithm based on the principle of optimism under uncertainty for this contextual reinforcement learning problem and provide a regret bound.
Author Information
Mika Sumida (USC)
Angela Zhou (Cornell University)
More from the Same Authors
-
2020 : Poster #12 »
Angela Zhou -
2022 Workshop: Spurious correlations, Invariance, and Stability (SCIS) »
Aahlad Puli · Maggie Makar · Victor Veitch · Yoav Wald · Mark Goldstein · Limor Gultchin · Angela Zhou · Uri Shalit · Suchi Saria -
2020 Workshop: Participatory Approaches to Machine Learning »
Angela Zhou · David Madras · Deborah Raji · Smitha Milli · Bogdan Kulynych · Richard Zemel -
2020 : Opening remarks »
Deborah Raji · Angela Zhou · David Madras · Smitha Milli · Bogdan Kulynych -
2018 Poster: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou -
2018 Oral: Residual Unfairness in Fair Machine Learning from Prejudiced Data »
Nathan Kallus · Angela Zhou