Sketching Intersection Profiles: A Simple Proof and Three Applications
This paper settles the complexity of three sketching problems in graphs and distributions.
This paper provides new and improved bounds for sketching problems in graphs and distributions, and introduces a new proof technique.
Before reading this…
Applications
- →Database query optimization
- →Machine learning model selection
To understand this paper, make sure you know these concepts first:
- Graph theoryfind papers →
- Probability theoryfind papers →
Abstract
More Like ThisIn this work we settle the complexity of three sketching problems. (i) We show that sketching vertex neighborhood sizes in graphs requires $Ω(n^2)$ bits, standing in sharp contrast to the $\tilde{O}(n)$ complexity of sketching edge cuts. (ii) We obtain tight lower and upper bounds of $\tildeΘ(n^2)$ for sketching coverage functions with additive and multiplicative errors. (iii) We prove an $Ω(n^2)$ lower bound for sketching Random Utility Models under the $\ell_\infty$-norm, improving upon the previous $Ω(n \log n)$ bound and matching a known upper bound to within logarithmic factors. These bounds are obtained through a connection with the problem of sketching the intersection profile of a distribution $D$ on $2^{[n]}$. Specifically, we seek a succinct data structure that, for any query set $S \subseteq [n]$, approximates the quantity $\Pr_{T \sim D}[T \cap S \neq \varnothing]$ to within a small constant additive error. One can obtain lower bounds for this latter problem directly from known results about the itemset frequency estimation problem in databases for which tight bounds are known. As an additional contribution, we also provide an alternative proof for the intersection profile sketching lower bound, in the setting in which the accuracy parameter is constant. This proof relies solely on elementary probability avoiding the heavier machinery used in previous proofs.