Skip to yearly menu bar Skip to main content


Oral

Oral C5 Misc

Ballroom C

Moderator: Kamyar Azizzadenesheli

Abstract:

Chat is not available.

Thu 27 July 18:00 - 18:08 PDT

Learning Mixtures of Markov Chains and MDPs

Chinmaya Kausik · Kevin Tan · Ambuj Tewari

We present an algorithm for learning mixtures of Markov chains and Markov decision processes (MDPs) from short unlabeled trajectories. Specifically, our method handles mixtures of Markov chains with optional control input by going through a multi-step process, involving (1) a subspace estimation step, (2) spectral clustering of trajectories using "pairwise distance estimators," along with refinement using the EM algorithm, (3) a model estimation step, and (4) a classification step for predicting labels of new trajectories. We provide end-to-end performance guarantees, where we only explicitly require the length of trajectories to be linear in the number of states and the number of trajectories to be linear in a mixing time parameter. Experimental results support these guarantees, where we attain 96.6% average accuracy on a mixture of two MDPs in gridworld, outperforming the EM algorithm with random initialization (73.2% average accuracy). We also significantly outperform the EM algorithm on real data from the LastFM song dataset.

Thu 27 July 18:08 - 18:16 PDT

Uncertain Evidence in Probabilistic Models and Stochastic Simulators

Andreas Munk · Alexander Mead · Frank Wood

We consider the problem of performing Bayesian inference in probabilistic models where observations are accompanied by uncertainty, referred to as "uncertain evidence.'' We explore how to interpret uncertain evidence, and by extension the importance of proper interpretation as it pertains to inference about latent variables. We consider a recently-proposed method "distributional evidence'' as well as revisit two older methods: Jeffrey's rule and virtual evidence. We devise guidelines on how to account for uncertain evidence and we provide new insights, particularly regarding consistency. To showcase the impact of different interpretations of the same uncertain evidence, we carry out experiments in which one interpretation is defined as "correct.'' We then compare inference results from each different interpretation illustrating the importance of careful consideration of uncertain evidence.

Thu 27 July 18:16 - 18:24 PDT

How Bad is Top-$K$ Recommendation under Competing Content Creators?

Fan Yao · Chuanhao Li · Denis Nekipelov · Hongning Wang · Haifeng Xu

This study explores the impact of content creators' competition on user welfare in recommendation platforms, as well as the long-term dynamics of relevance-driven recommendations. We establish a model of creator competition, under the setting where the platform uses a top-$K$ recommendation policy, user decisions are guided by the Random Utility model, and creators, in absence of explicit utility functions, employ arbitrary no-regret learning algorithms for strategy updates. We study the user welfare guarantee through the lens of Price of Anarchy and show that the fraction of user welfare loss due to creator competition is always upper bounded by a small constant depending on $K$ and randomness in user decisions; we also prove the tightness of this bound. Our result discloses an intrinsic merit of the relevance-driven recommendation policy, as long as users' decisions involve randomness and the platform provides reasonably many alternatives to its users.

Thu 27 July 18:24 - 18:32 PDT

Weighted Flow Diffusion for Local Graph Clustering with Node Attributes: an Algorithm and Statistical Guarantees

Shenghao Yang · Kimon Fountoulakis

Local graph clustering methods aim to detect small clusters in very large graphs without the need to process the whole graph. They are fundamental and scalable tools for a wide range of tasks such as local community detection, node ranking and node embedding. While prior work on local graph clustering mainly focuses on graphs without node attributes, modern real-world graph datasets typically come with node attributes that provide valuable additional information. We present a simple local graph clustering algorithm for graphs with node attributes, based on the idea of diffusing mass locally in the graph while accounting for both structural and attribute proximities. Using high-dimensional concentration results, we provide statistical guarantees on the performance of the algorithm for the recovery of a target cluster with a single seed node. We give conditions under which a target cluster generated from a fairly general contextual random graph model, which includes both the stochastic block model and the planted cluster model as special cases, can be fully recovered with bounded false positives. Empirically, we validate all theoretical claims using synthetic data, and we show that incorporating node attributes leads to superior local clustering performances using real-world graph datasets.

Thu 27 July 18:32 - 18:40 PDT

Equivariant Polynomials for Graph Neural Networks

Omri Puny · Derek Lim · Bobak T Kiani · Haggai Maron · Yaron Lipman

Graph Neural Networks (GNN) are inherently limited in their expressive power. Recent seminal works (Xu et al., 2019; Morris et al., 2019b) introduced the Weisfeiler-Lehman (WL) hierarchy as a measure of expressive power. Although this hierarchy has propelled significant advances in GNN analysis and architecture developments, it suffers from several significant limitations. These include a complex definition that lacks direct guidance for model improvement and a WL hierarchy that is too coarse to study current GNNs. This paper introduces an alternative expressive power hierarchy based on the ability of GNNs to calculate equivariant polynomials of a certain degree. As a first step, we provide a full characterization of all equivariant graph polynomials by introducing a concrete basis, significantly generalizing previous results. Each basis element corresponds to a specific multi-graph, and its computation over some graph data input corresponds to a tensor contraction problem. Second, we propose algorithmic tools for evaluating the expressiveness of GNNs using tensor contraction sequences, and calculate the expressive power of popular GNNs. Finally, we enhance the expressivity of common GNN architectures by adding polynomial features or additional operations / aggregations inspired by our theory. These enhanced GNNs demonstrate state-of-the-art results in experiments across multiple graph learning benchmarks.

Thu 27 July 18:40 - 18:48 PDT

Taming graph kernels with random features

Krzysztof Choromanski

We introduce in this paper the mechanism of graph random features (GRFs). GRFs can be used to construct unbiased randomized estimators of several important kernels defined on graphs' nodes, in particular the regularized Laplacian kernel. As regular RFs for non-graph kernels, they provide means to scale up kernel methods defined on graphs to larger networks. Importantly, they give substantial computational gains also for smaller graphs, while applied in downstream applications. Consequently, GRFs address the notoriously difficult problem of cubic (in the number of the nodes of the graph) time complexity of graph kernels algorithms. We provide a detailed theoretical analysis of GRFs and an extensive empirical evaluation: from speed tests, through Frobenius relative error analysis to kmeans graph-clustering with graph kernels. We show that the computation of GRFs admits an embarrassingly simple distributed algorithm that can be applied if the graph under consideration needs to be split across several machines. We also introduce a (still unbiased) quasi Monte Carlo variant of GRFs, q-GRFs, relying on the so-called reinforced random walks that might be used to optimize the variance of GRFs. As a byproduct, we obtain a novel approach to solve certain classes of linear equations with positive and symmetric matrices.

Thu 27 July 18:48 - 18:56 PDT

Robust Budget Pacing with a Single Sample

Santiago Balseiro · Rachitesh Kumar · Vahab Mirrokni · Balasubramanian Sivan · Di Wang

Major Internet advertising platforms offer budget pacing tools as a standard service for advertisers to manage their ad campaigns. Given the inherent non-stationarity in an advertiser's value and also competing advertisers' values over time, a commonly used approach is to learn a target expenditure plan that specifies a target spend as a function of time, and then run a controller that tracks this plan. This raises the question: *how many historical samples are required to learn a good expenditure plan*? We study this question by considering an advertiser repeatedly participating in $T$ second-price auctions, where the tuple of her value and the highest competing bid is drawn from an unknown time-varying distribution. The advertiser seeks to maximize her total utility subject to her budget constraint. Prior work has shown the sufficiency of *$T\log T$ samples per distribution* to achieve the optimal $O(\sqrt{T})$-regret. We dramatically improve this state-of-the-art and show that *just one sample per distribution* is enough to achieve the near-optimal $\tilde O(\sqrt{T})$-regret, while still being robust to noise in the sampling distributions.

Thu 27 July 18:56 - 19:04 PDT

Multicalibration as Boosting for Regression

Ira Globus-Harris · Declan Harrison · Michael Kearns · Aaron Roth · Jessica Sorrell

We study the connection between multicalibration and boosting for squared error regression. First we prove a useful characterization of multicalibration in terms of a ``swap regret'' like condition on squared error. Using this characterization, we give an exceedingly simple algorithm that can be analyzed both as a boosting algorithm for regression and as a multicalibration algorithm for a class $\mathcal{H}$ that makes use only of a standard squared error regression oracle for $\mathcal{H}$. We give a weak learning assumption on $\mathcal{H}$ that ensures convergence to Bayes optimality without the need to make any realizability assumptions --- giving us an agnostic boosting algorithm for regression. We then show that our weak learning assumption on $\mathcal{H}$ is both necessary and sufficient for multicalibration with respect to $\mathcal{H}$ to imply Bayes optimality, answering an open question. We also show that if $\mathcal{H}$ satisfies our weak learning condition relative to another class $\mathcal{C}$ then multicalibration with respect to $\mathcal{H}$ implies multicalibration with respect to $\mathcal{C}$. Finally we investigate the empirical performance of our algorithm experimentally.