feat(asap-planner): Support to generate SQL top-k CountMinSketchWithHeap configs#395
Conversation
milindsrivastava1997
left a comment
There was a problem hiding this comment.
- detect_sql_topk(&qdata) vs &sql_query.query_data[0] — sql.rs:108
let sql_topk = detect_sql_topk(&qdata); // ← from SQLPatternParser
let agg_info = &sql_query.query_data[0].aggregation_info; // ← from SQLPatternMatcher
let labels = &sql_query.query_data[0].labels;
All other field reads come from sql_query.query_data[0] (pattern-matcher output), but detect_sql_topk runs on qdata (raw parser output). For n=1 spatial queries these should be structurally equivalent, but if SQLPatternMatcher ever strips limit/order_by (they're not
part of matches_sql_pattern), this silently diverges.
The choice of qdata is correct today (it retains limit/order_by reliably), but deserves a one-line comment explaining why — especially since the query engine passes query_data from the pattern matcher at its call site (line 575).
- spatial_sum_topk_heap test coverage gap — tests/sql_integration.rs
The COUNT test (spatial_count_topk_heap) verifies everything: heapsize, count_events, grouping/aggregated/rollup labels, window size, no DeltaSet/CMS. The SUM test only checks count_events = false and the aggregation type. At minimum it should verify:
assert_eq!(out.aggregation_parameter("CountMinSketchWithHeap", "heapsize").and_then(|v| v.as_u64()), Some(40));
assert_eq!(out.aggregation_labels("CountMinSketchWithHeap", "grouping"), Vec::::new());
assert_eq!(out.aggregation_labels("CountMinSketchWithHeap", "aggregated"), vec!["srcip".to_string()]);
- Missing negative test: ORDER BY ... ASC LIMIT k
detect_sql_topk correctly returns None when primary.ascending, but there's no integration test confirming that ORDER BY alias ASC LIMIT k falls through to the CMS + DeltaSet path. Worth adding alongside the existing spatial_count_without_order_by_is_not_topk case.
Summary
Adds SQL top-k config generation to
asap_plannerand hoists shared detection intosql_utilities, so the planner and query engine use the same rules forCOUNT/SUM … GROUP BY … ORDER BY <alias> DESC LIMIT k.Previously the planner treated these queries as plain COUNT/SUM and emitted
DeltaSetAggregator+CountMinSketch. The engine expects a single heap-onlyCountMinSketchWithHeap.Changes
Shared detection (
sql_utilities)TopkWeighting,SqlTopk, anddetect_sql_topk()tosqlhelper.rsast_matching/mod.rsPlanner (
asap-planner-rs)detect_sql_topk()and plansStatistic::TopkCountMinSketchWithHeapper query (noDeltaSetAggregator)grouping: [],aggregated: [group key],rollup: [other metadata]aggregationSubType: topk,heapsize = k × 4,count_events: true/falsefor COUNT/SUMbuild_sketch_parameterswith optionaltopk_count_events(PromQL unchanged)PlannerOutput::aggregation_parameter()for testsQuery engine (
asap-query-engine)detect_sql_topk/ types; imports shared implementation fromsql_utilitiesdetect_topk_testsunchanged in behaviorBehavior
CountMinSketchWithHeapcount_eventstrue(COUNT) /false(SUM)n != 1)PromQL
topk(k, …)is unchanged. Plain COUNT/SUM without ORDER BY/LIMIT still uses the CMS + DeltaSet path.Planner integration tests (
asap-planner-rs/tests/sql_integration.rs)spatial_count_topk_heap— COUNT + ORDER BY DESC + LIMIT → 1 agg,CountMinSketchWithHeap(topk); no CMS/DeltaSet; labels +heapsize: 40,count_events: truespatial_sum_topk_heap— SUM + ORDER BY DESC + LIMIT → same heap path;count_events: falsespatial_count_without_order_by_is_not_topk— COUNT only → CMS + DeltaSet (2 aggs); no heap