The distinct elements problem in streaming algorithms refers to estimating the number of unique elements in a data stream while using limited memory. A generalization of this problem is Klee's Measure Problem, where the goal is to estimate the size of the union of sets when the sets arrive in a data stream, all while using limited memory and having restricted access to the sets.
While the distinct elements problem is well-studied in streaming algorithms, with tight bounds already established, Klee's Measure Problem has remained a major open problem in the field for many years.
We will present the first-ever efficient streaming algorithm (from PODS '21) for Klee's Measure Problem. This algorithm also provides a remarkably simple streaming solution for the distinct elements estimation problem, which even caught the attention of Donald E. Knuth (https://www-cs-faculty.stanford.edu/~knuth/papers/cvm-note.pdf).
This talk is based on joint works with N. V. Vinodchandran and Kuldeep S. Meel across multiple articles, notable the following:
Estimating the Size of Union of Sets in Streaming Models. PODS 2021
Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size. PODS 2022
Distinct Elements in Streams: An Algorithm for the (Text) Book. ESA 2022
Improved Streaming Algorithm for the Klee's Measure Problem and Generalizations. (jointly with Mridul Nandi, Arijit Ghosh and Soumit Pal) APPROX 2024