Skip to content

feat(asap-planner): Add support for SQL top-k config generation to asap_planner #394

@akanksha-akkihal

Description

@akanksha-akkihal

Summary

Extend asap_planner to generate streaming and inference configs for SQL top-k queries (COUNT/SUM … GROUP BY … ORDER BY <alias> DESC LIMIT k).

Today the planner treats these as plain COUNT/SUM and emits DeltaSetAggregator + CountMinSketch (two aggregations). The query engine expects a single self-keyed CountMinSketchWithHeap for top-k.

This change closes that gap so top-k precompute configs can be produced from planner workload YAML, consistent with quantile, HLL, and MinMax workloads.


Motivation

SQL top-k is already supported at query time: the engine detects the pattern via detect_sql_topk, promotes to Statistic::Topk, and resolves a CountMinSketchWithHeap with the right count_events weighting (COUNT vs SUM).

The planner had no equivalent path. For any SQL query with ORDER BY … DESC LIMIT k, it still planned a standard approximate COUNT/SUM pipeline:

  • Two streaming aggs (DeltaSetAggregator + CountMinSketch)
  • Subpopulation label layout (aggregated: [group key], with a separate delta set)
  • No heapsize, no count_events

That does not match what the engine needs for top-k execution.

Sharing detection logic in sql_utilities keeps planner output aligned with engine behavior and avoids two implementations drifting apart.


Proposed behavior

When the planner parses a flat SQL query of this shape:

SELECT <key>, COUNT(<col>) AS <alias>   -- or SUM(<col>) AS <alias>
FROM <table>
WHERE <time window>
GROUP BY <key>
ORDER BY <alias> DESC
LIMIT k

It should:

  1. Detect top-k via shared detect_sql_topk():

    • Requires LIMIT
    • ORDER BY on aggregate alias (DESC)
    • Only supports COUNT or SUM
  2. Plan Statistic::Topk instead of Count/Sum

  3. Emit a single CountMinSketchWithHeap per query

    • No DeltaSetAggregator
  4. Use heap-only label layout:

    • grouping: []
    • aggregated: []
    • rollup: [metadata columns not in GROUP BY]
  5. Set sketch parameters:

    • aggregationSubType: topk
    • heapsize: k × heap_multiplier (default multiplier = 4)
    • count_events:
      • true for COUNT
      • false for SUM

Behavior Summary

Property COUNT top-k SUM top-k
Agg type CountMinSketchWithHeap CountMinSketchWithHeap
Sub-type topk topk
Streaming aggs 1 1
count_events true false
Labels heap-only self-keyed heap-only self-keyed

Notes

  • Queries without ORDER BY + LIMIT continue using the existing CMS + DeltaSet path unchanged.
  • PromQL topk(k, …) is unchanged.
  • Not supported:
    • Nested SQL
    • Multiple aggregates in SELECT
    • Non-COUNT/SUM aggregates for top-k planning

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions