Skip to content

feat(asap-planner): Support to generate SQL top-k CountMinSketchWithHeap configs#395

Merged
milindsrivastava1997 merged 3 commits into
mainfrom
394-feat-asap-planner-add-support-for-sql-top-k-config-generation
Jun 9, 2026
Merged

feat(asap-planner): Support to generate SQL top-k CountMinSketchWithHeap configs#395
milindsrivastava1997 merged 3 commits into
mainfrom
394-feat-asap-planner-add-support-for-sql-top-k-config-generation

Conversation

@akanksha-akkihal

Copy link
Copy Markdown
Collaborator

Summary

Adds SQL top-k config generation to asap_planner and hoists shared detection into sql_utilities, so the planner and query engine use the same rules for COUNT/SUM … GROUP BY … ORDER BY <alias> DESC LIMIT k.

Previously the planner treated these queries as plain COUNT/SUM and emitted DeltaSetAggregator + CountMinSketch. The engine expects a single heap-only CountMinSketchWithHeap.

Changes

Shared detection (sql_utilities)

  • Added TopkWeighting, SqlTopk, and detect_sql_topk() to sqlhelper.rs
  • Re-exported from ast_matching/mod.rs
  • Single-layer detection only; nested-pattern gating stays with callers

Planner (asap-planner-rs)

  • Detects top-k via shared detect_sql_topk() and plans Statistic::Topk
  • Emits one CountMinSketchWithHeap per query (no DeltaSetAggregator)
  • Heap-only labels: grouping: [], aggregated: [group key], rollup: [other metadata]
  • Sketch params: aggregationSubType: topk, heapsize = k × 4, count_events: true/false for COUNT/SUM
  • Extended build_sketch_parameters with optional topk_count_events (PromQL unchanged)
  • Added PlannerOutput::aggregation_parameter() for tests
  • Integration tests: COUNT top-k, SUM top-k, non-top-k fallback

Query engine (asap-query-engine)

  • Removed duplicate detect_sql_topk / types; imports shared implementation from sql_utilities
  • Existing detect_topk_tests unchanged in behavior

Behavior

Before After
Flat COUNT/SUM + ORDER BY + LIMIT CMS + DeltaSet (2 aggs) Single CountMinSketchWithHeap
count_events absent true (COUNT) / false (SUM)
Nested SQL planner error (n != 1) same (still unsupported)

PromQL topk(k, …) is unchanged. Plain COUNT/SUM without ORDER BY/LIMIT still uses the CMS + DeltaSet path.

Planner integration tests (asap-planner-rs/tests/sql_integration.rs)

  • spatial_count_topk_heap — COUNT + ORDER BY DESC + LIMIT → 1 agg, CountMinSketchWithHeap (topk); no CMS/DeltaSet; labels + heapsize: 40, count_events: true
  • spatial_sum_topk_heap — SUM + ORDER BY DESC + LIMIT → same heap path; count_events: false
  • spatial_count_without_order_by_is_not_topk — COUNT only → CMS + DeltaSet (2 aggs); no heap

@milindsrivastava1997 milindsrivastava1997 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. detect_sql_topk(&qdata) vs &sql_query.query_data[0] — sql.rs:108

let sql_topk = detect_sql_topk(&qdata); // ← from SQLPatternParser
let agg_info = &sql_query.query_data[0].aggregation_info; // ← from SQLPatternMatcher
let labels = &sql_query.query_data[0].labels;

All other field reads come from sql_query.query_data[0] (pattern-matcher output), but detect_sql_topk runs on qdata (raw parser output). For n=1 spatial queries these should be structurally equivalent, but if SQLPatternMatcher ever strips limit/order_by (they're not
part of matches_sql_pattern), this silently diverges.

The choice of qdata is correct today (it retains limit/order_by reliably), but deserves a one-line comment explaining why — especially since the query engine passes query_data from the pattern matcher at its call site (line 575).


  1. spatial_sum_topk_heap test coverage gap — tests/sql_integration.rs

The COUNT test (spatial_count_topk_heap) verifies everything: heapsize, count_events, grouping/aggregated/rollup labels, window size, no DeltaSet/CMS. The SUM test only checks count_events = false and the aggregation type. At minimum it should verify:

assert_eq!(out.aggregation_parameter("CountMinSketchWithHeap", "heapsize").and_then(|v| v.as_u64()), Some(40));
assert_eq!(out.aggregation_labels("CountMinSketchWithHeap", "grouping"), Vec::::new());
assert_eq!(out.aggregation_labels("CountMinSketchWithHeap", "aggregated"), vec!["srcip".to_string()]);


  1. Missing negative test: ORDER BY ... ASC LIMIT k

detect_sql_topk correctly returns None when primary.ascending, but there's no integration test confirming that ORDER BY alias ASC LIMIT k falls through to the CMS + DeltaSet path. Worth adding alongside the existing spatial_count_without_order_by_is_not_topk case.


@milindsrivastava1997 milindsrivastava1997 merged commit 4bc9bf7 into main Jun 9, 2026
11 checks passed
@milindsrivastava1997 milindsrivastava1997 deleted the 394-feat-asap-planner-add-support-for-sql-top-k-config-generation branch June 9, 2026 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(asap-planner): Add support for SQL top-k config generation to asap_planner

2 participants