Skip to content

feat(core,cli): credit DeepSeek prompt-cache hits in cost tracking#160

Merged
oratis merged 1 commit into
mainfrom
feat/prompt-cache-cost
Jun 4, 2026
Merged

feat(core,cli): credit DeepSeek prompt-cache hits in cost tracking#160
oratis merged 1 commit into
mainfrom
feat/prompt-cache-cost

Conversation

@oratis
Copy link
Copy Markdown
Owner

@oratis oratis commented Jun 4, 2026

DeepSeek's prompt caching is automatic server-side — the API returns prompt_cache_hit_tokens, already parsed into ProviderUsage.cacheReadTokens. But that field was dropped everywhere downstream, so cost tracking ignored it: /cost billed every input token at the full ¥1/M miss price, even though cache hits bill at ¥0.1/M (~10×).

Changes

  • estimateCost(usage, model) — new pure core util (pricing per docs/design/effort-levels.md §2.4). Splits input into cache-miss (¥1/M) + cache-hit (¥0.1/M), prices output (¥2/¥16) + reasoning (¥4), and returns a breakdown plus cacheHitRate and cacheSavingsYuan. Exported from @deepcode/core.
  • Propagate cacheReadTokens through RunAgentResult.usage (agent loop), SessionContext.usage, and the REPL's per-turn accumulator — it was silently lost at each hop.
  • /cost now shows the cache-hit count + rate, a miss/cache input split, and the ¥ saved by caching.

inputTokens (= prompt_tokens) is inclusive of the cache hits, so cache-miss = input − hits.

Tests

  • pricing.test.ts — 6 cases: no-cache · 80%-hit credit (verifies the ¥0.1/M rate + savings) · reasoner output/reasoning pricing · clamp + unknown-model fallback · empty session.
  • /cost test asserts the cache-hit line + rate + savings render.
  • core 639 · cli 135, typecheck clean across all workspaces, repo-wide format:check clean.

Follow-up (not in scope)

  • The desktop inspector "Spend" figure is still cache-blind — it can adopt estimateCost now that it's exported.

🤖 Generated with Claude Code

DeepSeek prompt caching is automatic server-side; the API returns
prompt_cache_hit_tokens (already parsed into ProviderUsage.cacheReadTokens) —
but it was dropped everywhere downstream (RunAgentResult.usage, the REPL
accumulator, SessionContext.usage), so /cost billed every input token at the
full ¥1/M miss price.

- New core util estimateCost(usage, model) — pricing per effort-levels.md §2.4:
  input cache-miss ¥1/M + cache-hit ¥0.1/M, output ¥2/16/M, reasoning ¥4/M;
  returns breakdown + cacheHitRate + cacheSavingsYuan. Exported from @deepcode/core.
- Propagate cacheReadTokens through RunAgentResult.usage, SessionContext.usage,
  and the REPL accumulator.
- /cost now shows cache-hit count + rate, a miss/cache input split, and ¥ saved.

Tests: pricing.test.ts (6 cases) + /cost asserts the cache display. core 639 · cli 135.

Follow-up: desktop inspector "Spend" could use estimateCost too (cache-blind today).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@oratis oratis merged commit 3ecf9af into main Jun 4, 2026
3 checks passed
@oratis oratis deleted the feat/prompt-cache-cost branch June 4, 2026 15:38
oratis added a commit that referenced this pull request Jun 4, 2026
The desktop inspector's "Spend" hardcoded ¥1/M input + ¥2/M output and ignored
reasoning tokens — so it ignored prompt-cache savings AND mispriced the reasoner
(output ¥16/M, reasoning ¥4/M). Switch it to the shared estimateCost util (#160),
which credits cache-hit input tokens and applies per-model rates.

- AgentEvent 'usage' now carries cacheReadTokens (emitted from both the per-turn
  and post-compaction sites); the desktop's local AgentEvt mirrors it.
- Repl.tsx computes per-turn cost via estimateCost, reading the live model via a
  ref (the onEvent subscription is created once on mount, so it would otherwise
  bill at a stale model's rate).
- Export ./dist/providers/pricing.js from @deepcode/core (leaf module, no
  node:fs) so the renderer can import estimateCost without pulling in the index.

typecheck clean across all workspaces; core 639 · desktop 54.

Co-authored-by: t <t@t>
Co-authored-by: Claude Opus 4.8 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant