Summary
Extend asap_planner to generate streaming and inference configs for SQL top-k queries (COUNT/SUM … GROUP BY … ORDER BY <alias> DESC LIMIT k).
Today the planner treats these as plain COUNT/SUM and emits DeltaSetAggregator + CountMinSketch (two aggregations). The query engine expects a single self-keyed CountMinSketchWithHeap for top-k.
This change closes that gap so top-k precompute configs can be produced from planner workload YAML, consistent with quantile, HLL, and MinMax workloads.
Motivation
SQL top-k is already supported at query time: the engine detects the pattern via detect_sql_topk, promotes to Statistic::Topk, and resolves a CountMinSketchWithHeap with the right count_events weighting (COUNT vs SUM).
The planner had no equivalent path. For any SQL query with ORDER BY … DESC LIMIT k, it still planned a standard approximate COUNT/SUM pipeline:
- Two streaming aggs (
DeltaSetAggregator + CountMinSketch)
- Subpopulation label layout (
aggregated: [group key], with a separate delta set)
- No
heapsize, no count_events
That does not match what the engine needs for top-k execution.
Sharing detection logic in sql_utilities keeps planner output aligned with engine behavior and avoids two implementations drifting apart.
Proposed behavior
When the planner parses a flat SQL query of this shape:
SELECT <key>, COUNT(<col>) AS <alias> -- or SUM(<col>) AS <alias>
FROM <table>
WHERE <time window>
GROUP BY <key>
ORDER BY <alias> DESC
LIMIT k
It should:
-
Detect top-k via shared detect_sql_topk():
- Requires
LIMIT
ORDER BY on aggregate alias (DESC)
- Only supports
COUNT or SUM
-
Plan Statistic::Topk instead of Count/Sum
-
Emit a single CountMinSketchWithHeap per query
-
Use heap-only label layout:
- grouping: []
- aggregated: []
- rollup: [metadata columns not in GROUP BY]
-
Set sketch parameters:
aggregationSubType: topk
heapsize: k × heap_multiplier (default multiplier = 4)
count_events:
true for COUNT
false for SUM
Behavior Summary
| Property |
COUNT top-k |
SUM top-k |
| Agg type |
CountMinSketchWithHeap |
CountMinSketchWithHeap |
| Sub-type |
topk |
topk |
| Streaming aggs |
1 |
1 |
| count_events |
true |
false |
| Labels |
heap-only self-keyed |
heap-only self-keyed |
Notes
- Queries without
ORDER BY + LIMIT continue using the existing CMS + DeltaSet path unchanged.
- PromQL
topk(k, …) is unchanged.
- Not supported:
- Nested SQL
- Multiple aggregates in
SELECT
- Non-COUNT/SUM aggregates for top-k planning
Summary
Extend
asap_plannerto generate streaming and inference configs for SQL top-k queries (COUNT/SUM … GROUP BY … ORDER BY <alias> DESC LIMIT k).Today the planner treats these as plain
COUNT/SUMand emitsDeltaSetAggregator+CountMinSketch(two aggregations). The query engine expects a single self-keyedCountMinSketchWithHeapfor top-k.This change closes that gap so top-k precompute configs can be produced from planner workload YAML, consistent with quantile, HLL, and MinMax workloads.
Motivation
SQL top-k is already supported at query time: the engine detects the pattern via
detect_sql_topk, promotes toStatistic::Topk, and resolves aCountMinSketchWithHeapwith the rightcount_eventsweighting (COUNT vs SUM).The planner had no equivalent path. For any SQL query with
ORDER BY … DESC LIMIT k, it still planned a standard approximate COUNT/SUM pipeline:DeltaSetAggregator+CountMinSketch)aggregated: [group key], with a separate delta set)heapsize, nocount_eventsThat does not match what the engine needs for top-k execution.
Sharing detection logic in
sql_utilitieskeeps planner output aligned with engine behavior and avoids two implementations drifting apart.Proposed behavior
When the planner parses a flat SQL query of this shape:
It should:
Detect top-k via shared
detect_sql_topk():LIMITORDER BYon aggregate alias (DESC)COUNTorSUMPlan
Statistic::Topkinstead ofCount/SumEmit a single
CountMinSketchWithHeapper queryDeltaSetAggregatorUse heap-only label layout:
Set sketch parameters:
aggregationSubType:topkheapsize:k × heap_multiplier(default multiplier = 4)count_events:truefor COUNTfalsefor SUMBehavior Summary
Notes
ORDER BY + LIMITcontinue using the existing CMS + DeltaSet path unchanged.topk(k, …)is unchanged.SELECT