Tags: replit/river-python
Tags
codegen: emit guards for every anyOf variant to fix mypy union-attr o… …n array-containing unions (#182) Why === The encoder generator for non-discriminated anyOf unions emits a chain of ternary expressions, with the last variant historically rendered as the unguarded `else` branch. That works for simple unions like `object | str | list` (mypy can negative-narrow `x` to `list` in the final branch), but it breaks for deeper unions where the array variant is last, e.g. ``` str | float | bool | list[scalar] | None ``` When mypy fails to fully narrow `x` to `list[...]` through the prior `isinstance` checks (`isinstance(x, (int, float))` plus `bool` subclassing `int` make this tricky), it complains that scalar items of the union have no `__iter__` attribute: ``` error: Item "float" of "str | float | bool | None | list[...]" has no attribute "__iter__" (not iterable) [union-attr] error: Item "bool" of ... has no attribute "__iter__" [union-attr] error: Item "object" of ... has no attribute "__iter__" [union-attr] ``` This is the exact failure that has been blocking ai-infra's `codegen-latest-pid2-schema.yml` auto-update workflow since 2026-05-04, when repl-it-web#78355 widened `agentToolPostgreSQL.executeSqlCommand.params` from a flat scalar union to `array<scalar | array<scalar>>`. Every run since has failed on the regenerated `executeSqlCommand.py` at the `for y in x` iteration inside `encode_ExecutesqlcommandInputParams`. The committed pid2 client in ai-infra has been kept current by hand (see replit/ai-infra#12813), but the bot has been red for ~2.5 weeks. What changed ============ `src/replit_river/codegen/client.py`: in the non-discriminated-anyOf branch of `encode_type`, emit an explicit `isinstance` / `is None` guard for every entry in `encoder_parts` — including the last one — and append a `cast(Any, x)` fallback. mypy no longer has to negative-narrow into the iterating branch, so deep unions with an array variant lint cleanly. `Any` and `cast` are already part of `FILE_HEADER` so no import bookkeeping changes. Concretely, for the failing executeSqlCommand schema, the encoder now ends with: ```python return ( x if isinstance(x, str) else x if isinstance(x, (int, float)) else x if isinstance(x, bool) else None if x is None else [encode_..._AnyOf_4(y) for y in x] if isinstance(x, list) else cast(Any, x) ) ``` Test plan ========= - Existing `tests/v1/codegen/snapshot/test_anyof_mixed.py` snapshot updated to show the new `if isinstance(x, list) else cast(Any, x)` tail on its `obj | str | list[str]` encoder (the change is additive — the runtime behavior is unchanged). - New snapshot test `tests/v1/codegen/snapshot/test_anyof_array_in_union.py` added with a schema that mirrors `executeSqlCommand.params` (`array<scalar | array<scalar>>`) and locks in the fixed output. This is the regression test for ai-infra's CI failure. - `uv run pytest` is green (67 passed, including all v1 and v2 codegen tests). - `make lint` is clean apart from a pre-existing `pyright` `grpc` import error in `tests/v1/test_communication.py` that also fails on `main` (unrelated). - End-to-end verification against ai-infra: pointed ai-infra's `./pkgs/pid2_client/scripts/generate.sh` at this branch via `RIVER_CODEGEN_PATH=/tmp/opencode/river-python` and reran the full lint pipeline that the auto-update workflow runs in CI; `[mypy] completed in 15.19s` and the script exited `OK.` instead of the historical `union-attr` failure. Once this is released (e.g. `v0.17.20`) ai-infra can bump `replit-river` in `pkgs/pid2_client/pyproject.toml` and the auto-update workflow will start producing green PRs again. ~ written by Zerg 👾 ([ascendant-goliath-6d2f](https://zerg.zergrush.dev/chat?id=ascendant-goliath-6d2f))
Fix codegen with `from` keyword (#177) Why === For schema with `from` and `to` properties. from is a Python reserved keyword, and the river-python codegen was generating invalid Python like: ``` class Rewrites(TypedDict): from: NotRequired[str | None] # SyntaxError! ``` What changed ============ - src/replit_river/codegen/typing.py — Added import keyword and extended normalize_special_chars to append _ to Python keywords (e.g., from -> from_). The existing alias logic in client.py already handles setting Field(alias="from") for BaseModel when the field name is normalized, so no changes needed there. - tests/v1/codegen/test_input_special_chars.py — Added two new tests (test_python_keyword_field_names_basemodel and test_python_keyword_field_names_typeddict) that verify the codegen produces valid Python when schema fields use reserved keywords like from, class, and import. Test plan ========= Added new tests
Use .get() for discriminator access in generated union encoders (#176) Why === Follow-up to #175. The discriminator field in a discriminated union may be `NotRequired` in the TypedDict. Direct key access (`x["shapeType"]`) triggers pyright's `reportTypedDictNotRequiredAccess` error. This broke the pid2 codegen CI when the scribe schema added discriminated union variants where the discriminator field is optional. What changed ============ Use `x.get("key")` instead of `x["key"]` for discriminator checks in the generated ternary chain. This is safe because a missing key returns `None`, which won't match any discriminator value and falls through to the next branch. Test plan ========= - All 64 tests pass - Updated snapshot for `test_unknown_enum` - Tested end-to-end against the pid2 schema from ai-infra — codegen, mypy, and pyright all pass ~ written by Zerg 👾
Fix mypy arg-type errors in generated discriminated union encoders (#175 ) Why === The codegen for discriminated union TypedDict encoders produces ternary chains like: ```python encode_Foo(x) if x["kind"] == "foo" else encode_Bar(x) ``` mypy can't narrow union types through these ternary conditions, so it flags every encoder call as receiving the wrong type (`arg-type`). This broke the pid2 codegen CI when new discriminated union variants were added to a schema. What changed ============ Use `cast()` to explicitly narrow the type to the correct variant after the discriminator check, instead of suppressing with `# type: ignore[arg-type]`. This preserves type safety in the generated code. Before: ```python encode_Foo(x) # type: ignore[arg-type] if x["kind"] == "foo" else encode_Bar(x) # type: ignore[arg-type] ``` After: ```python encode_Foo(cast('Foo', x)) if x["kind"] == "foo" else encode_Bar(cast('Bar', x)) ``` Affects both the single-variant and multi-variant discriminator code paths. Test plan ========= CI
feat: Propagate OTel context via WebSocket HTTP upgrade headers (#174) Why === River's WebSocket connections don't carry any OTel context (traceparent, tracestate, baggage) from client to server. This means distributed tracing and baggage propagation are broken at the WebSocket boundary — the server has no way to inherit the caller's trace context or read OTel baggage entries. What changed ============ Uses the standard W3C HTTP header approach — the same mechanism any HTTP service uses for OTel propagation — applied to the WebSocket upgrade request. **Client side (`client_transport.py`, `v2/session.py`)** - Before calling `websockets.connect()`, inject the current OTel context into a headers dict via `propagate.inject()`. - Pass those headers as `extra_headers` (v1 legacy API) / `additional_headers` (v2 asyncio API) to the connect call. - This automatically includes `traceparent`, `tracestate`, and `baggage` headers if the corresponding propagators are configured in the global textmap. **Server side (`server.py`)** - In `Server.serve()`, extract the OTel context from `websocket.request_headers` via `propagate.extract()`. - Attach the extracted context as the ambient OTel context for the lifetime of the connection using `context.attach()` / `context.detach()`. - Any handler code running within the connection can now read baggage via `baggage.get_all()` and inherits the caller's trace context. **Tests (`tests/v1/test_opentelemetry.py`)** - `test_baggage_propagated_via_ws_headers`: Sets two baggage entries on the client, verifies the server handler can read them. - `test_no_baggage_when_none_set`: Verifies clean behavior when no baggage is set. - `test_traceparent_propagated_via_ws_headers`: Sets both an active span and baggage on the client, verifies both propagate. Test plan ========= ``` $ uv run pytest tests/ -v 64 passed in 8.46s ``` All existing tests pass unchanged. The 3 new tests verify end-to-end OTel context propagation through the WebSocket connection. ## Revertibility Safe to revert — only adds new `extra_headers`/`additional_headers` to `websockets.connect()` and a `propagate.extract()` + `context.attach()` wrapper on the server. No wire protocol changes, no schema changes, no data mutations. ~ written by Zerg 👾
Fix codegen crashes for intersection types and complex list inner typ… …es (#172) Why === Codegen fails when a schema contains intersection types (`allOf`) or lists with complex inner types (e.g. `list[dict[str, Any]]`). Both crash with `Complex type must be put through render_type_expr!` or `Unexpected expression when expecting a type name: DictTypeExpr(...)` because `TypeName` objects are used directly in f-strings or passed to `ensure_literal_type` which only accepts simple `TypeName` values. What changed ============ - Fix `TypeName.__str__()` crash in the `RiverIntersectionType` encoder by wrapping `encoder_name` with `render_literal_type()`, matching the existing pattern used by `RiverUnionType` (line 625) and `RiverConcreteType` (line 654) - Fix `ensure_literal_type` crash when a list's inner type is a complex expression (e.g. `list[dict[str, Any]]`) by guarding the `ListTypeExpr` match to only enter the encoding branch for `TypeName` inner types, falling through to `list(x)` for composite types that don't need encoding Test plan ========= _Describe what you did to test this change to a level of detail that allows your reviewer to test it_
Fix codegen for non-discriminated anyOf unions with mixed types (#171) Why === The encoder generation for TypedDict inputs produces malformed Python code when handling `anyOf` unions containing mixed types like `[object, string, array]`. Before ```python return ( encode_...AnyOf_0(x) x if isinstance(x, str) else encode_str(x) ) ``` After ```python return ( encode_...AnyOf_0(x) if isinstance(x, dict) else x if isinstance(x, str) else list(x) ) ``` What changed ============ - Collect `(type_check, encoder_expr)` pairs for each union member - Build a proper ternary chain with `isinstance` checks - Handle primitive array items by returning `list(x)` instead of undefined encoder calls Test plan ========= CI
feat: recursive types (#170) Why === Recursive types weren't supported in River codegen. When a type referenced itself (like a tree node with children of the same type), it would generate `list[Any]` instead of proper forward references. What changed ============ Added support for JSON Schema's `$id`/`$ref` to handle recursive types. Now generates proper forward references like `list["TreeNode"]` instead of `list[Any]`. Test plan ========= Added a test with a recursive schema (tree node with children). All existing tests pass.
Upgrade pydantic version (#168) Why === Our pydantic version is getting a little out of date, and we were pinning to a specific version. What changed ============ Made the package less proscriptive regarding pydantic version, and changed minimum python version to 3.12. Test plan ========= CI/CD
Properly close streams on exception (#167) Why === We've had persistent timeout errors in AI-Infra, and I suspect that it's related to not handling bumps in the connection correctly. What changed ============ - WebSocket drops or send failures left _streams populated, so any in-flight RPC hung until the session fully shut down. That meant clients didn’t see an abort signal and could block indefinitely even though the transport was already defunct. - Added _abort_all_streams() in src/replit_river/session.py#L289 and call it from both client_session.serve() and server_session.serve() on ConnectionClosed, FailedSendingMessageException, or any other unexpected exception (src/replit_river/client_session.py#L95, src/replit_river/server_session.py#L82). This immediately closes every active channel and clears _streams, ensuring callers are notified right away when the socket dies so they can retry or surface an error. Test plan ========= CI/CD, ran against an internal branch with no issues 3x without flake.
PreviousNext