feat(core,cli): credit DeepSeek prompt-cache hits in cost tracking#160
Merged
Conversation
DeepSeek prompt caching is automatic server-side; the API returns prompt_cache_hit_tokens (already parsed into ProviderUsage.cacheReadTokens) — but it was dropped everywhere downstream (RunAgentResult.usage, the REPL accumulator, SessionContext.usage), so /cost billed every input token at the full ¥1/M miss price. - New core util estimateCost(usage, model) — pricing per effort-levels.md §2.4: input cache-miss ¥1/M + cache-hit ¥0.1/M, output ¥2/16/M, reasoning ¥4/M; returns breakdown + cacheHitRate + cacheSavingsYuan. Exported from @deepcode/core. - Propagate cacheReadTokens through RunAgentResult.usage, SessionContext.usage, and the REPL accumulator. - /cost now shows cache-hit count + rate, a miss/cache input split, and ¥ saved. Tests: pricing.test.ts (6 cases) + /cost asserts the cache display. core 639 · cli 135. Follow-up: desktop inspector "Spend" could use estimateCost too (cache-blind today). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
oratis
added a commit
that referenced
this pull request
Jun 4, 2026
The desktop inspector's "Spend" hardcoded ¥1/M input + ¥2/M output and ignored reasoning tokens — so it ignored prompt-cache savings AND mispriced the reasoner (output ¥16/M, reasoning ¥4/M). Switch it to the shared estimateCost util (#160), which credits cache-hit input tokens and applies per-model rates. - AgentEvent 'usage' now carries cacheReadTokens (emitted from both the per-turn and post-compaction sites); the desktop's local AgentEvt mirrors it. - Repl.tsx computes per-turn cost via estimateCost, reading the live model via a ref (the onEvent subscription is created once on mount, so it would otherwise bill at a stale model's rate). - Export ./dist/providers/pricing.js from @deepcode/core (leaf module, no node:fs) so the renderer can import estimateCost without pulling in the index. typecheck clean across all workspaces; core 639 · desktop 54. Co-authored-by: t <t@t> Co-authored-by: Claude Opus 4.8 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DeepSeek's prompt caching is automatic server-side — the API returns
prompt_cache_hit_tokens, already parsed intoProviderUsage.cacheReadTokens. But that field was dropped everywhere downstream, so cost tracking ignored it:/costbilled every input token at the full ¥1/M miss price, even though cache hits bill at ¥0.1/M (~10×).Changes
estimateCost(usage, model)— new pure core util (pricing perdocs/design/effort-levels.md §2.4). Splits input into cache-miss (¥1/M) + cache-hit (¥0.1/M), prices output (¥2/¥16) + reasoning (¥4), and returns a breakdown pluscacheHitRateandcacheSavingsYuan. Exported from@deepcode/core.cacheReadTokensthroughRunAgentResult.usage(agent loop),SessionContext.usage, and the REPL's per-turn accumulator — it was silently lost at each hop./costnow shows the cache-hit count + rate, a miss/cache input split, and the ¥ saved by caching.inputTokens(=prompt_tokens) is inclusive of the cache hits, so cache-miss = input − hits.Tests
pricing.test.ts— 6 cases: no-cache · 80%-hit credit (verifies the ¥0.1/M rate + savings) · reasoner output/reasoning pricing · clamp + unknown-model fallback · empty session./costtest asserts the cache-hit line + rate + savings render.format:checkclean.Follow-up (not in scope)
estimateCostnow that it's exported.🤖 Generated with Claude Code