Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Optimal bounds for p sensitivity sampling via 2 augmentation

Alexander Munteanu · Simon Omlor

Hall C 4-9 #2002

Abstract: Data subsampling is one of the most natural methods to approximate a massively large data set by a small representative proxy. In particular, sensitivity sampling received a lot of attention, which samples points proportional to an individual importance measure called sensitivity. This framework reduces in very general settings the size of data to roughly the VC dimension d times the total sensitivity S while providing strong (1±ε) guarantees on the quality of approximation. The recent work of Woodruff & Yasuda (2023c) improved substantially over the general ˜O(ε2Sd) bound for the important problem of p subspace embeddings to ˜O(ε2S2/p) for p[1,2]. Their result was subsumed by an earlier ˜O(ε2Sd1p/2) bound which was implicitly given in the work of Chen & Derezinski (2021). We show that their result is tight when sampling according to plain p sensitivities. We observe that by augmenting the p sensitivities by 2 sensitivities, we obtain better bounds improving over the aforementioned results to optimal linear ˜O(ε2(S+d))=˜O(ε2d) sampling complexity for all p[1,2]. In particular, this resolves an open question of Woodruff & Yasuda (2023c) in the affirmative for p[1,2] and brings sensitivity subsampling into the regime that was previously only known to be possible using Lewis weights (Cohen & Peng, 2015). As an application of our main result, we also obtain an ˜O(ε2μd) sensitivity sampling bound for logistic regression, where μ is a natural complexity measure for this problem. This improves over the previous ˜O(ε2μ2d) bound of Mai et al. (2021) which was based on Lewis weights subsampling.

Chat is not available.