Summary
Support SQL queries of the form:
SELECT srcip, COUNT(DISTINCT dstip) AS unique_peers FROM netflow_table WHERE time BETWEEN DATEADD(s, -11, NOW()) AND DATEADD(s, -10, NOW()) GROUP BY srcip
This should work end-to-end through ASAPQuery’s precompute/streaming engine using a HyperLogLog (HLL) sketch (asap_sketchlib::HllSketch).
The implementation should:
- Support approximate distinct counting via HLL
- Integrate with the precompute pipeline
- Be configurable via versioned inference + streaming configs
Current Status
COUNT(DISTINCT column) is is currently not supported in the streaming/precompute path, due to gaps across multiple layers:
| Layer |
Gap |
| SQL parser |
DISTINCT inside COUNT is ignored or not normalized; aggregation remains COUNT instead of cardinality |
| Statistic mapping |
No mapping from AggregationOperator::Cardinality → Statistic::Cardinality |
| Capability matching |
Statistic::Cardinality only maps to SetAggregator / DeltaSetAggregator, not HLL |
| Precompute |
Missing HllAccumulator, factory dispatch, and serde support for SketchType::HLL |
| SQL matcher |
SQLPatternMatcher rejects "CARDINALITY" and enforces distinct targets to be value_columns only |
Summary
Support SQL queries of the form:
SELECT srcip, COUNT(DISTINCT dstip) AS unique_peers FROM netflow_table WHERE time BETWEEN DATEADD(s, -11, NOW()) AND DATEADD(s, -10, NOW()) GROUP BY srcipThis should work end-to-end through ASAPQuery’s precompute/streaming engine using a HyperLogLog (HLL) sketch (
asap_sketchlib::HllSketch).The implementation should:
Current Status
COUNT(DISTINCT column)is is currently not supported in the streaming/precompute path, due to gaps across multiple layers:DISTINCTinsideCOUNTis ignored or not normalized; aggregation remainsCOUNTinstead of cardinalityAggregationOperator::Cardinality→Statistic::CardinalityStatistic::Cardinalityonly maps toSetAggregator/DeltaSetAggregator, not HLLHllAccumulator, factory dispatch, and serde support forSketchType::HLLSQLPatternMatcherrejects"CARDINALITY"and enforces distinct targets to bevalue_columnsonly