Skip to content

feat: SQL COUNT(DISTINCT column) support via HyperLogLog Sketch#368

Merged
milindsrivastava1997 merged 3 commits into
mainfrom
367-feat-sql-countdistinct-column-support-via-hyperloglog-sketch
May 26, 2026
Merged

feat: SQL COUNT(DISTINCT column) support via HyperLogLog Sketch#368
milindsrivastava1997 merged 3 commits into
mainfrom
367-feat-sql-countdistinct-column-support-via-hyperloglog-sketch

Conversation

@akanksha-akkihal

@akanksha-akkihal akanksha-akkihal commented May 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds end-to-end support for COUNT(DISTINCT column) on the precompute streaming path using a HyperLogLog (HLL) sketch (asap_sketchlib::HllSketch). Enables NetFlow-style queries such as distinct dstip per srcip with ORDER BY / LIMIT, and supports benchmarking against ClickHouse on the same ingest window.

COUNT(DISTINCT col) is parsed as CARDINALITY, routed to Statistic::Cardinality, matched to AggregationType::HLL, updated via HllAccumulatorUpdater, stored with lossless msgpack serde, and answered at query time with HllSketch::estimate().

Code Changes

SQL Parsing & Validation (sql_utilities)

File Change
sqlpattern_parser.rs COUNT(DISTINCT single_column) → aggregation name "CARDINALITY"; reject multi-column DISTINCT and DISTINCT on non-COUNT aggregates
sqlpattern_matcher.rs Add CARDINALITY to legal_aggregations; for CARDINALITY only, accept distinct target in metadata_columns or value_columns
sqlparser_test.rs Parser + matcher tests for accept/reject, pattern match, ORDER BY/LIMIT, SpatioTemporal classification

Statistic Routing (promql_utilities)

File Change
enums.rs AggregationOperator::CardinalityStatistic::Cardinality (FromStr, to_statistics, as_str, is_approximate)

Capability Matching (asap_types)

File Change
capability_matching.rs compatible_agg_types(Cardinality) includes AggregationType::HLL (single-population; no paired key aggregation)

Precompute Engine (query_engine_rust)

File Change
precompute_operators/hll_accumulator.rs New — wraps asap_sketchlib::HllSketch; implements AggregateCore, SerializableToSink, SingleSubpopulationAggregate, MergeableAccumulator
precompute_operators/mod.rs Export hll_accumulator
precompute_engine/accumulator_factory.rs HllAccumulatorUpdater, hll_precision_param (default 14, clamp 4–18), AggregationType::HLL dispatch
engines/physical/accumulator_serde.rs Deserialize SketchType::HLL in deserialize_accumulator and deserialize_single_subpopulation

Tests

33 new unit tests across:

  • SQL parsing & matching
  • Statistic routing
  • Capability matching
  • HLL accumulator (update, merge, query, serde)
  • Factory + serde integration

@akanksha-akkihal akanksha-akkihal marked this pull request as ready for review May 26, 2026 04:07
@milindsrivastava1997 milindsrivastava1997 merged commit 79cdb62 into main May 26, 2026
10 checks passed
@milindsrivastava1997 milindsrivastava1997 deleted the 367-feat-sql-countdistinct-column-support-via-hyperloglog-sketch branch May 26, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: SQL COUNT(DISTINCT column) support via HyperLogLog Sketch

2 participants