Skip to content

feat(eval): add --concurrency flag for parallel item evaluation#1648

Merged
zdenekmusil-gd merged 1 commit into
masterfrom
zmu/gdai-1832-parallel-runs
Jun 5, 2026
Merged

feat(eval): add --concurrency flag for parallel item evaluation#1648
zdenekmusil-gd merged 1 commit into
masterfrom
zmu/gdai-1832-parallel-runs

Conversation

@zdenekmusil-gd

Copy link
Copy Markdown
Contributor

--concurrency K sends up to K dataset items to the agent simultaneously, load-testing the agent under concurrent requests. Default 1 = sequential (existing behaviour unchanged).

When K > 1, run_items() dispatches items to ThreadPoolExecutor(max_workers=K). Each item still runs --runs times sequentially (pass@K); parallelism is across items, not within an item's K runs. Results are collected in input order regardless of completion order. All progress callbacks and Langfuse logging are invoked from worker threads (Console and per-call httpx clients are thread-safe).

JIRA: GDAI-1832

@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.18%. Comparing base (d0a608b) to head (360f274).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1648   +/-   ##
=======================================
  Coverage   79.18%   79.18%           
=======================================
  Files         232      232           
  Lines       15791    15791           
=======================================
  Hits        12504    12504           
  Misses       3287     3287           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

--concurrency K sends up to K dataset items to the agent simultaneously,
load-testing the agent under concurrent requests. Default 1 = sequential
(existing behaviour unchanged).

When K > 1, run_items() dispatches items to ThreadPoolExecutor(max_workers=K).
Each item still runs --runs times sequentially (pass@K); parallelism is
across items, not within an item's K runs. Results are collected in input
order regardless of completion order. All progress callbacks and Langfuse
logging are invoked from worker threads (Console and per-call httpx clients
are thread-safe).

JIRA: GDAI-1832
@zdenekmusil-gd zdenekmusil-gd force-pushed the zmu/gdai-1832-parallel-runs branch from fde42cb to 360f274 Compare June 5, 2026 08:02
@zdenekmusil-gd zdenekmusil-gd merged commit de012d6 into master Jun 5, 2026
13 checks passed
@zdenekmusil-gd zdenekmusil-gd deleted the zmu/gdai-1832-parallel-runs branch June 5, 2026 09:20
@zdenekmusil-gd zdenekmusil-gd restored the zmu/gdai-1832-parallel-runs branch June 8, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants