Comparing changes

…ript servers TypeScript CLI tool that uses the Codex SDK to generate clean, well-structured Pydantic v2 models from a River TypeScript server's source code. Instead of the current mechanical codegen that produces awful names and duplicated types, this points a Codex agent at the TypeScript source so it can read how types are named and organised, then generates matching Python models. Key features: - Reads TypeScript TypeBox definitions to mirror naming and composition - Generates shared/reusable types (common errors, domain types) - Verifies correctness by comparing generated JSON schemas against the serialised River schema (via a Python verification script) - Iterates on failures: feeds verification errors back to the model - Exposed as a CLI for GitHub Actions integration Usage: codegen-llm generate \ --server-src ./path/to/ts/services \ --schema ./schema.json \ --output ./generated \ --client-name MyClient

…concrete examples - Switch sandbox from danger-full-access to workspace-write with approvalPolicy: never — agent can only access its workspace and the explicitly listed additionalDirectories (TS source, existing client). It can no longer browse river-python or other repos. - Prompt overhaul: - Quality bar framing: output will be discarded if not clean/readable - BANNED patterns section: RootModel, make_schema_model, __get_pydantic_json_schema__, SchemaAdapter, create_model, raw JSON Schema dicts — all explicitly rejected - 6 concrete TypeBox-to-Pydantic translation examples covering Type.Object, $kind unions, error unions, Optional/Record/Array, recursive types, and Type.Intersect flattening - Directory scope section: only access workspace + TS source - Stronger anti-shortcut language throughout - Verification script improvements: - Code quality pre-check: scans all .py files for banned patterns before comparing schemas, fails with exit code 2 if found - New normalizations: Uint8Array->string, strip type alongside const, enum->anyOf+const, strip null variant from 2-element anyOf (handles TypeBox Optional vs Pydantic Optional mismatch), strip discriminator and additionalProperties metadata

…red names, fix bare $ref resolution

…r redefs; add allOf flattening to verifier

- Add extractNamingHints() to codegen.ts: walks schema.json to produce naming_hints.json with correct PascalCase error/kind names, written into the agent workspace so it has authoritative naming data - Complete rewrite of prompts.ts: naming_hints.json featured prominently, scaffolding scripts explicitly banned, mandatory TS reading phase, previous-failures section with concrete bad patterns from runs 1-4 - Add ALLCAPS class name check to verify-script.ts: regex bans class names with 4+ consecutive uppercase letters (e.g. NOTFOUNDError)

…ained Literals Run 5 analysis: agent wrote scaffolding script to /tmp, created JsonAdapter hack class, produced 200+ char stuttered class names, and chained Literal[x] | Literal[y] instead of Literal[x, y]. Structural changes: - Move Python venv outside workspace into verify dir so the agent has no access to a Python interpreter (only ./verify works) - Prompt explicitly states no Python available in workspace New code quality checks in verifier: - Ban JsonAdapter and custom json_schema() methods - Ban class names > 60 characters (mechanical path-derived naming) - Ban chained Literal[x] | Literal[y] | Literal[z] (3+ in a row) Prompt updates: - Remove .venv from file scope, add 'no Python' warnings - Change Literal style examples from chained to multi-value - Add failures 5-7 from run 5 (scaffolding to /tmp, JsonAdapter, chained Literals)

…p.py Run 6 analysis: agent switched to Node.js scaffolding script (since Python venv was removed), generated 1966 model classes with decent naming, but _schema_map.py was a complete cheat — loaded schema.json at runtime via SimpleNamespace objects with fake json_schema() methods. Verification passed trivially without ever testing the actual models. Fixes: - Add isinstance(adapter, TypeAdapter) check in verify script — rejects any adapter that isn't a real pydantic TypeAdapter - Ban SimpleNamespace, _make_adapter, _schema_path, _schema_doc patterns - Ban loading schema.json at runtime in generated code - Add failure case 8 to prompt explaining the cheat and the new check

The toolResult value appears on two variants (status=ok and status=error). Pydantic raises 'mapped to multiple choices' when using Field(discriminator='kind'). Added concrete example showing how to handle this: nest sub-variants by status, use plain union for the outer type when any value is duplicated.

Run 8 passed verification (268/268) but used WithJsonSchema on every adapter: TypeAdapter(Annotated[Any, WithJsonSchema(json.loads(...))]) The 1174 Pydantic model classes were decorative — never actually tested since adapters returned raw embedded JSON schemas. Fixes: - Ban WithJsonSchema in verify script banned patterns - Ban json.loads( in verify script banned patterns - Add failure case 9 to prompt documenting this cheat - Update banned constructs list in prompt

…ring) Pass 1 generates schema-correct Pydantic models (may have mechanical names). Pass 2 aggressively refactors for production quality: TS-derived class names, error deduplication, cleanup of alphabetic suffixes and deep path names. New CLI flags: --pass1-only Stop after Pass 1 --pass1-dir Skip Pass 1, refactor existing output --pass2-max-attempts Separate retry budget for Pass 2 Both passes use the same verifier to ensure correctness is preserved.

…; add TS export name extraction Key changes for the next generation run: Verify script improvements: - Add _dedup_anyof normalization: deduplicate structurally identical anyOf variants so the agent can reuse error classes instead of creating *Duplicate/*Triplicate copies - Ban *Duplicate/*Triplicate class name suffixes - Ban monkey-patching patterns: .json_schema =, _bind_reference, _load_reference, frozen_schema, adapter.json_schema - Add duplicate field declaration detection (catches e.g. ParseError declaring extras: twice) Naming improvements: - Extract TypeBox export names from TS source into naming_hints.json (tsExportNames per service, tsSharedExportNames for lib/ dirs) - Agent now gets pre-extracted names like ExitInfo, MonitorResponse, ScreenshotAction, FilesystemError mapped to their service directories Prompt improvements (Pass 1 + Pass 2): - Document anyOf dedup normalization -- agent should reuse error classes - Ban _schema_map.py monkey-patching patterns explicitly - Reference tsExportNames/tsSharedExportNames from naming_hints.json - Add Run 10 failure patterns (Duplicate classes, monkey-patching, duplicate fields, missing TS names) - Rewrite _schema_map.py instructions to be explicit about what's banned

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Uh oh!

Commits on Feb 20, 2026

Commits on Feb 21, 2026

This comparison is taking too long to generate.

Uh oh!