[bot] Merge master/0b20d26c into rel/dev by yenkins-admin · Pull Request #1651 · gooddata/gooddata-python-sdk

yenkins-admin · 2026-06-08T09:23:17Z

🚀 Automated PR to perform merge from master into rel/dev with changes up to 0b20d26 (created by https://github.com/gooddata/gooddata-python-sdk/actions/runs/27128109813).

Implements Path B: evaluate the dashboard-summary feature through the dedicated synchronous endpoint (POST /api/v1/ai/workspaces/{ws}/summary), which executes AFM server-side — no SSE or client-side result_id wrangling. - SummaryClient: posts summary_input, maps the JSON summary into ChatResult - DashboardSummaryEvaluator: rubric-based LLM judge (must_include / must_not_include / rubric), scored per-criterion so quality_score is the fraction satisfied - SummaryInput dataset field; dashboard_id is the only required input - runner ChatBackend now receives the DatasetItem; CLI routes summary items to SummaryClient and everything else to ChatClient - registered as a lazy [llm-judge] evaluator; docs + example dataset + tests Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

The workspace ACTIVE_LLM_PROVIDER setting is keyed by type on the backend and may exist under any id (e.g. UI-generated). Reading it by the hardcoded id "activeLlmProvider" missed existing settings (-> "no active LLM provider"), and activating re-created a second setting of the same type (-> HTTP 409). Now look it up by setting_type via list_workspace_settings, and reuse the existing setting's id on activate so create_or_update performs an UPDATE. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

- evaluator: score must_not_include via plain presence-detection then invert, instead of asking the judge to reason about avoidance under an "EXPECTED OUTPUT" label (which inverted verdicts on negative criteria) - reporting: make the FAIL note evaluator-agnostic (list whichever boolean checks are False) instead of the visualization-only "no visualization created" - examples: replace the single template with three self-describing cases — full-dashboard, selected-visualizations (scoping), and format-hint-brief (format adherence) — each with a small gating set and rubric aligned to the endpoint's actual output Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

…-eval # Conflicts: # packages/gooddata-eval/README.md # packages/gooddata-eval/src/gooddata_eval/cli/main.py # packages/gooddata-eval/src/gooddata_eval/core/workspace.py # packages/gooddata-eval/tests/test_workspace.py

Running without --model takes the default branch, which set provider_name but never provider_type, causing UnboundLocalError when building ResolvedModel. Initialize it to "" alongside provider_name. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

dashboard_summary items sourced from a Langfuse dataset previously lost summary_input (SummaryClient then failed with "missing summary_input"). Map it from the item input object, metadata, or expectedOutput so summary datasets round-trip through Langfuse. The --langfuse/--langfuse-dataset contract is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

Per PR review: the dashboard_summary example items were uploaded to a Langfuse dataset, so the local examples/summary_dataset files are no longer needed in the repo. The README still documents the format inline. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

ty rejected passing dict|None where SummaryInput|None is expected (pydantic coerces at runtime, the type checker doesn't). Build the SummaryInput via model_validate so the static type matches DatasetItem.summary_input. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

Trace metadata is not a Langfuse dashboard breakdown dimension, so per-model charts/filters were impossible. Expose the model on the first-class trace `version` field so dashboards can break down / filter by "Version". Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

feat(eval): evaluate the dashboard-summary skill via the /summary endpoint

codecov · 2026-06-08T09:27:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.18%. Comparing base (73a34c6) to head (0b20d26).
⚠️ Report is 521 commits behind head on rel/dev.

Additional details and impacted files

@@           Coverage Diff            @@
##           rel/dev    #1651   +/-   ##
========================================
  Coverage    79.18%   79.18%           
========================================
  Files          232      232           
  Lines        15791    15791           
========================================
  Hits         12504    12504           
  Misses        3287     3287

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Roman Rakus and others added 10 commits June 4, 2026 17:26

Merge pull request #1646 from gooddata/rr/summary-endpoint-eval

0b20d26

feat(eval): evaluate the dashboard-summary skill via the /summary endpoint

yenkins-admin requested review from hkad98, lupko and pcerny as code owners June 8, 2026 09:23

yenkins-admin merged commit 24c41b4 into rel/dev Jun 8, 2026
1 check passed

yenkins-admin deleted the snapshot-master-0b20d26c-to-rel/dev branch June 8, 2026 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Merge master/0b20d26c into rel/dev#1651

[bot] Merge master/0b20d26c into rel/dev#1651
yenkins-admin merged 10 commits into
rel/devfrom
snapshot-master-0b20d26c-to-rel/dev

yenkins-admin commented Jun 8, 2026

Uh oh!

Uh oh!

codecov Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yenkins-admin commented Jun 8, 2026

Uh oh!

Uh oh!

codecov Bot commented Jun 8, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants