ci(nightly): fetch API keys from Key Vault via OIDC by miguelgfierro · Pull Request #120 · fireflyframework/fireflyframework-agentic

miguelgfierro · 2026-05-07T18:35:18Z

Summary

Removes direct use of ANTHROPIC_API_KEY, EMBEDDING_BINDING_HOST, and EMBEDDING_BINDING_API_KEY as GitHub repo secrets in nightly.yml
Replaces them with Azure Key Vault lookups (kv-firefly-signature) after OIDC login, with ::add-mask:: applied before writing to $GITHUB_ENV
Moves to per-job permissions blocks — benchmark no longer inherits issues:write from the test job
Simplifies the benchmark job: removes the "check secrets available" gate since the KV fetch either succeeds or fails the job explicitly

Infrastructure required (see PR description)

Before this workflow runs successfully, the following must be in place:

Key Vault secrets (in kv-firefly-signature):
- ANTHROPIC-API-KEY
- EMBEDDING-BINDING-HOST
- EMBEDDING-BINDING-API-KEY
OIDC app registration (firefly-github-actions):
- Federated credential subject: repo:fireflyframework/fireflyframework-agentic:ref:refs/heads/main
- Roles: Key Vault Secrets User on the KV, AcrPush on fireflysignature, Contributor on firefly-mcp
GitHub repository secrets (Actions → Secrets):
- AZURE_CLIENT_ID → client ID of firefly-github-actions
- AZURE_TENANT_ID → cda419af-f2d8-444c-9248-4ada4f168c8a
- AZURE_SUBSCRIPTION_ID → e8b8063e-f842-4a59-9754-427ddb7bfb63
Delete old GitHub secrets once the workflow is confirmed working:
- ANTHROPIC_API_KEY, EMBEDDING_BINDING_HOST, EMBEDDING_BINDING_API_KEY

Replace GitHub repo secrets (ANTHROPIC_API_KEY, EMBEDDING_BINDING_HOST, EMBEDDING_BINDING_API_KEY) with Azure Key Vault lookups. Add OIDC login and kv-firefly-signature fetch to both the test and benchmark jobs. Move from a single top-level permissions block to per-job permissions so the benchmark job no longer inherits issues:write from the test job.

Separate the single shared service principal into two: - AZURE_CI_CLIENT_ID (nightly/benchmark): Key Vault Secrets User only - AZURE_DEPLOY_CLIENT_ID (deploy-mcp): AcrPush + Container App write only Also hardcode tenant and subscription IDs as plain values — they are not credentials and do not belong in the secrets store.

javier-alvarez · 2026-05-08T07:49:34Z

Nightly does not work with this branch https://github.com/fireflyframework/fireflyframework-agentic/actions/runs/25543811013

Service principal error

The 25 870-row CSV converted to 24 399 chunks via MarkdownChunker (one flat table, no heading boundaries). Embedding and storing that many vectors flooded the SQLite WAL and corrupted the database, causing "database disk image is malformed" on the xlsx ingest that follows. None of the 32 benchmark queries ask about billing data; the CSV is purely a distractor. 200 rows (190 chunks) is enough to test that billing-ledger content is not surfaced for company-fact queries, without overwhelming the in-memory SQLite corpus.

paraphrase_ceo_idiomatic now returns the correct doc at rank 1 but a different chunk (distractor docs shifted relative vector scores). The answer is still correct; only substring_match drops to 27/28 = 0.9643.

shutil.copyfile on a WAL-mode SQLite database copies only the main file, leaving committed WAL frames behind. When two DatabaseStore instances share the same backend path (e.g. SqliteCorpus + SqliteVecVectorStore both pointing to corpus.sqlite), each upload produces an etag change that triggers the other store to re-download. The download overwrites the shared cache and stales the WAL, wiping all data written in that round. Fix: detect SQLite files by their 16-byte magic header and use sqlite3.Connection.backup() instead of shutil.copyfile. The backup API walks both the main file and any unflushed WAL frames, so the uploaded file is always a complete, consistent snapshot.

sqlite3.Connection.backup() opens the source without loading the sqlite_vec extension, which makes schema parsing fail when the database has a vec0 virtual table — producing a malformed copy. The correct fix is to issue PRAGMA wal_checkpoint(FULL) on the existing connection (which already has sqlite_vec loaded) immediately after each COMMIT, so the main database file is fully up-to-date before shutil.copyfile captures it for upload. Reverts the _copy_for_upload / backup() helper added in the previous attempt and restores plain shutil.copyfile in LocalBackend._upload_sync.

miguelgfierro added 2 commits May 7, 2026 20:34

miguelgfierro assigned miguelgfierro and javier-alvarez and unassigned miguelgfierro May 7, 2026

ci: restore AZURE_TENANT_ID and AZURE_SUBSCRIPTION_ID as secrets

ab929f1

miguelgfierro added 6 commits May 8, 2026 13:54

fix(benchmark): update baseline to reflect 14-doc corpus

1ccbbf1

paraphrase_ceo_idiomatic now returns the correct doc at rank 1 but a different chunk (distractor docs shifted relative vector scores). The answer is still correct; only substring_match drops to 27/28 = 0.9643.

ci: bump azure/login to v3.0.0 (Node.js 24 compatible)

23737ab

fix(benchmark): restore full billing ledger CSV (25,870 rows)

257cf4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(nightly): fetch API keys from Key Vault via OIDC#120

ci(nightly): fetch API keys from Key Vault via OIDC#120
miguelgfierro wants to merge 9 commits intomainfrom
fix/actions-kv-secrets

miguelgfierro commented May 7, 2026

Uh oh!

javier-alvarez commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

miguelgfierro commented May 7, 2026

Summary

Infrastructure required (see PR description)

Uh oh!

javier-alvarez commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants