Enable strict mypy and add typed search models (#100)#103
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughEnables strict mypy and CI mypy invocation, adds TypedDict search contracts, extracts search logic into services.search (global, legacy, CLI), rewires api/search to orchestrate sources, and applies typing broadly across APIs, services, utils, models, and tests. ChangesStrict MyPy, Typed Contracts, and CI
Search Service and Tests
API orchestration and route typing
Workspace/service typing and DB helpers
Models, utils, and miscellaneous typing
Sequence Diagram sequenceDiagram
participant Client
participant ApiSearch as api.search
participant ParseWarnings as ParseWarningCollector
participant SearchSvc as services.search
participant GlobalDB as globalStorage/state.vscdb
participant WorkspaceDB as workspaceStorage/state.vscdb
participant CliStore as cli_chats/store.db
Client->>ApiSearch: HTTP GET /api/search?q=...
ApiSearch->>ParseWarnings: create collector
ApiSearch->>SearchSvc: search_global_storage(...) when type != "chat"
SearchSvc->>GlobalDB: query composerData and bubbleId rows
ApiSearch->>SearchSvc: search_legacy_workspaces(...)
SearchSvc->>WorkspaceDB: query ItemTable chat payloads
ApiSearch->>SearchSvc: search_cli_sessions(...) when type == "all"
SearchSvc->>CliStore: traverse session blobs
SearchSvc-->>ApiSearch: results lists
ApiSearch->>SearchSvc: rank_results(merged)
ApiSearch->>ParseWarnings: attach_to(payload)
ApiSearch-->>Client: jsonify(results + warnings)
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
models/search.py (1)
17-27: ⚡ Quick winConstrain
SearchResultdiscriminator fields withLiteral.
type: strandsource: NotRequired[str]are broader than the actual contract, which reduces strict-mypy value for call sites.Proposed diff
-from typing import Any, NotRequired, TypedDict +from typing import Any, Literal, NotRequired, TypedDict @@ class SearchResult(TypedDict): @@ - type: str # "composer" | "chat" | "cli_agent" - source: NotRequired[str] # "cli" for CLI agent sessions + type: Literal["composer", "chat", "cli_agent"] + source: NotRequired[Literal["cli"]]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/search.py` around lines 17 - 27, The SearchResult TypedDict currently uses broad types for discriminators; change the type annotation for SearchResult.type from str to Literal["composer", "chat", "cli_agent"] and change SearchResult.source from NotRequired[str] to NotRequired[Literal["cli"]] (import Literal if not already), so call sites get precise typing; update any imports to include Literal and adjust any code that constructs SearchResult entries to use those exact string literals where applicable.services/workspace_listing.py (1)
141-150: 💤 Low valueExtract duplicated
_safe_fetchallhelper to a shared location. The_safe_fetchallhelper withcast(list[sqlite3.Row], ...)is defined four times as a local function: once inservices/workspace_listing.py(line 141) and three times inservices/workspace_tabs.py(lines 551, 712, 851). All implementations are identical. Extracting this to a module-level helper inservices/workspace_db.pywould eliminate the duplication and centralize the SQLite error-handling pattern.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/workspace_listing.py` around lines 141 - 150, Extract the duplicated local helper _safe_fetchall into a single module-level function in services/workspace_db.py (e.g., def safe_fetchall(query: str, params: tuple[Any, ...] = ()) -> list[sqlite3.Row]) that performs the same cast(list[sqlite3.Row], global_db.execute(...).fetchall()) and except sqlite3.Error: return [] behavior; then remove the four local definitions of _safe_fetchall in services/workspace_listing.py and services/workspace_tabs.py and replace their usages with calls to the new safe_fetchall (importing it from services.workspace_db). Ensure the function name and signature match callers or update the call sites accordingly to avoid type errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/search.py`:
- Around line 38-48: The current logic always calls search_global_storage, so
requests with search_type == "chat" still include composer/global results;
update the caller around results.extend(search_global_storage(...)) to only
invoke search_global_storage when search_type is not "chat" (e.g., if
search_type != "chat" or when search_type == "all"/"composer"), leaving calls to
search_legacy_workspaces and search_cli_sessions unchanged; ensure the
conditional references the search_type variable and the search_global_storage
function so the chat-only path excludes global/composer results.
In `@CHANGELOG.md`:
- Around line 10-17: The changelog contains duplicate subsection headings "###
Added" and "### Changed" under the same release section; merge the new bullets
about strict mypy (`strict = true` in `pyproject.toml`, TypedDict models
`SearchResult`, `ConversationSummary`, and full annotations on API routes and
`utils/`) and the CI typecheck change (CI job runs `mypy .` with pyproject
config and per-module overrides) into the existing single "### Added" and "###
Changed" entries respectively so there is only one "### Added" and one "###
Changed" block under `[Unreleased]`.
In `@services/search.py`:
- Around line 398-399: The loop over tabs currently uses tab.get("tabId")
directly which can be missing or non-string and violates the SearchResult
contract; change the code that builds the chatId (alongside ct =
tab.get("chatTitle") or "") to coerce tab.get("tabId") into a stable string
(e.g., use str(...) or default to "" when falsy) so SearchResult.chatId is
always a string; apply the same fix in the other occurrence that builds results
(the block around lines 434-439) so all created SearchResult instances have
string chatId values.
- Around line 561-568: _ts is normalizing timestamps inconsistently (ISO strings
are converted to epoch-seconds while integer epochs may be in milliseconds),
causing incorrect ordering; update the _ts(SearchResult) function to always
return epoch-seconds as a float: parse ISO strings with
datetime.fromisoformat(...) as before, and for numeric values (int/float) detect
millisecond epochs (e.g., values > 1e12 or > 1e11) and divide by 1000.0 when
appropriate, handling zero/None safely and returning 0.0 for unparseable inputs
so all sources are compared on the same seconds scale.
---
Nitpick comments:
In `@models/search.py`:
- Around line 17-27: The SearchResult TypedDict currently uses broad types for
discriminators; change the type annotation for SearchResult.type from str to
Literal["composer", "chat", "cli_agent"] and change SearchResult.source from
NotRequired[str] to NotRequired[Literal["cli"]] (import Literal if not already),
so call sites get precise typing; update any imports to include Literal and
adjust any code that constructs SearchResult entries to use those exact string
literals where applicable.
In `@services/workspace_listing.py`:
- Around line 141-150: Extract the duplicated local helper _safe_fetchall into a
single module-level function in services/workspace_db.py (e.g., def
safe_fetchall(query: str, params: tuple[Any, ...] = ()) -> list[sqlite3.Row])
that performs the same cast(list[sqlite3.Row],
global_db.execute(...).fetchall()) and except sqlite3.Error: return [] behavior;
then remove the four local definitions of _safe_fetchall in
services/workspace_listing.py and services/workspace_tabs.py and replace their
usages with calls to the new safe_fetchall (importing it from
services.workspace_db). Ensure the function name and signature match callers or
update the call sites accordingly to avoid type errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ced4d7c8-578e-4a3d-a1db-72d7960b4433
📒 Files selected for processing (35)
.github/workflows/tests.ymlCHANGELOG.mdapi/composers.pyapi/config_api.pyapi/export_api.pyapi/flask_config.pyapi/logs.pyapi/pdf.pyapi/search.pyapi/workspaces.pyapp.pymodels/__init__.pymodels/conversation.pymodels/parse_warnings.pymodels/search.pypyproject.tomlservices/cli_tabs.pyservices/search.pyservices/summary_cache.pyservices/workspace_context.pyservices/workspace_db.pyservices/workspace_listing.pyservices/workspace_resolver.pyservices/workspace_tabs.pytests/test_models.pytests/test_models_wired_at_read_sites.pytests/test_search_helpers.pyutils/cli_chat_reader.pyutils/cursor_md_exporter.pyutils/debug_flag.pyutils/exclusion_rules.pyutils/path_helpers.pyutils/text_extract.pyutils/tool_parser.pyutils/workspace_descriptor.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
services/search.py (1)
317-321:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t promote missing timestamps to “now”.
rank_results()now sinks unknown timestamps to0.0, but the three builders still replace missing values with the current time. That makes untimestamped chats outrank genuinely recent hits once/api/searchmerges and sorts the combined list. Use a stable sentinel here instead so missing metadata stays at the bottom.Suggested fix
results.append({ "workspaceId": ws_id, "workspaceFolder": ws_name, "chatId": composer_id, "chatTitle": title, "timestamp": ( to_epoch_ms(composer.last_updated_at) or to_epoch_ms(composer.created_at) - or int(datetime.now().timestamp() * 1000) + or 0 ), "matchingText": matching_text, "type": "composer", }) @@ results.append({ "workspaceId": name, "workspaceFolder": workspace_folder, "chatId": tab_id, "chatTitle": ct or f"Chat {tab_id[:8]}", - "timestamp": tab.get("lastSendTime") or datetime.now().isoformat(), + "timestamp": tab.get("lastSendTime") or 0, "matchingText": matching_text, "type": "chat", }) @@ - created_ms: int = ( - meta.get("createdAt") or int(datetime.now().timestamp() * 1000) - ) + raw_created_ms = meta.get("createdAt") + created_ms = raw_created_ms if isinstance(raw_created_ms, int) else 0Also applies to: 440-440, 485-487, 562-575
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/search.py` around lines 317 - 321, The builders are promoting missing timestamps to the current time by using to_epoch_ms(composer.last_updated_at) or to_epoch_ms(composer.created_at) or int(datetime.now().timestamp() * 1000); change this to use a stable sentinel (e.g., 0 or a named constant) instead of datetime.now() so untimestamped items remain low-ranked. Update each place that builds the "timestamp" field (the shown block and the other instances noted) to fallback to the sentinel when to_epoch_ms returns falsy, ensuring consistency with rank_results() which treats unknown timestamps as 0.0; refer to to_epoch_ms, composer.last_updated_at, composer.created_at, and rank_results when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@services/search.py`:
- Around line 317-321: The builders are promoting missing timestamps to the
current time by using to_epoch_ms(composer.last_updated_at) or
to_epoch_ms(composer.created_at) or int(datetime.now().timestamp() * 1000);
change this to use a stable sentinel (e.g., 0 or a named constant) instead of
datetime.now() so untimestamped items remain low-ranked. Update each place that
builds the "timestamp" field (the shown block and the other instances noted) to
fallback to the sentinel when to_epoch_ms returns falsy, ensuring consistency
with rank_results() which treats unknown timestamps as 0.0; refer to
to_epoch_ms, composer.last_updated_at, composer.created_at, and rank_results
when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e9be71e-69dc-40c1-942d-c7279622e372
📒 Files selected for processing (7)
CHANGELOG.mdapi/search.pymodels/search.pyservices/search.pyservices/workspace_db.pyservices/workspace_listing.pyservices/workspace_tabs.py
✅ Files skipped from review due to trivial changes (2)
- CHANGELOG.md
- services/workspace_db.py
🚧 Files skipped from review as they are similar to previous changes (3)
- api/search.py
- services/workspace_listing.py
- services/workspace_tabs.py
6006c25 to
88dbe83
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
models/parse_warnings.py (1)
29-29: ⚡ Quick winNarrow exception type to
Exception.The parameter accepts
BaseException, but all call sites (e.g.,services/search.pyline 343) catch and passException.BaseExceptionincludes system exits and keyboard interrupts, which are never logged as source failures in practice.♻️ Tighten the type
- def record_source_failure(self, exc: BaseException, source: str) -> None: + def record_source_failure(self, exc: Exception, source: str) -> None:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/parse_warnings.py` at line 29, Change the record_source_failure signature to accept Exception rather than BaseException: update the method record_source_failure(self, exc: Exception, source: str) in models/parse_warnings.py (and adjust any type imports) so it matches callers that pass Exception and avoids catching system-level BaseException like SystemExit/KeyboardInterrupt; update any type hints or tests referring to the old signature accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/search.py`:
- Line 494: The current assignment created_ms: int =
to_epoch_ms(meta.get("createdAt")) is correct because to_epoch_ms returns 0 for
None/invalid inputs; remove any added fallback like "or
_UNKNOWN_SEARCH_TIMESTAMP" unless this codebase intentionally treats 0 as an
unknown sentinel. If you do need a non-zero sentinel, explicitly state that
intent and change the line to use "or _UNKNOWN_SEARCH_TIMESTAMP" and document
why; otherwise leave the single call to to_epoch_ms as-is.
---
Nitpick comments:
In `@models/parse_warnings.py`:
- Line 29: Change the record_source_failure signature to accept Exception rather
than BaseException: update the method record_source_failure(self, exc:
Exception, source: str) in models/parse_warnings.py (and adjust any type
imports) so it matches callers that pass Exception and avoids catching
system-level BaseException like SystemExit/KeyboardInterrupt; update any type
hints or tests referring to the old signature accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 24c15480-d369-4c7d-ae96-751477063a31
📒 Files selected for processing (34)
.github/workflows/tests.ymlCHANGELOG.mdapi/composers.pyapi/config_api.pyapi/export_api.pyapi/flask_config.pyapi/logs.pyapi/pdf.pyapi/search.pyapi/workspaces.pyapp.pymodels/__init__.pymodels/conversation.pymodels/parse_warnings.pymodels/search.pypyproject.tomlservices/cli_tabs.pyservices/search.pyservices/summary_cache.pyservices/workspace_context.pyservices/workspace_db.pyservices/workspace_listing.pyservices/workspace_resolver.pyservices/workspace_tabs.pytests/test_models.pytests/test_search_helpers.pyutils/cli_chat_reader.pyutils/cursor_md_exporter.pyutils/debug_flag.pyutils/exclusion_rules.pyutils/path_helpers.pyutils/text_extract.pyutils/tool_parser.pyutils/workspace_descriptor.py
✅ Files skipped from review due to trivial changes (15)
- utils/text_extract.py
- utils/workspace_descriptor.py
- api/flask_config.py
- api/pdf.py
- api/config_api.py
- api/logs.py
- utils/cursor_md_exporter.py
- api/workspaces.py
- services/workspace_context.py
- services/cli_tabs.py
- CHANGELOG.md
- api/export_api.py
- utils/cli_chat_reader.py
- services/summary_cache.py
- api/composers.py
🚧 Files skipped from review as they are similar to previous changes (15)
- models/search.py
- utils/debug_flag.py
- tests/test_models.py
- models/init.py
- utils/path_helpers.py
- utils/tool_parser.py
- models/conversation.py
- services/workspace_resolver.py
- utils/exclusion_rules.py
- tests/test_search_helpers.py
- pyproject.toml
- api/search.py
- services/workspace_listing.py
- services/workspace_tabs.py
- services/workspace_db.py
Must fixCI typecheck will fail — mypy . reports 20 errors (api/search.py:54, api/workspaces.py:69,83,99,140, api/composers.py:116,186,225, api/config_api.py:42,55,64,71,81,125,147,156, api/logs.py:152, api/export_api.py:75, services/cli_tabs.py:141, api/pdf.py:55/161) (Issue #100 requires zero mypy errors on annotated modules; jsonify() is typed as Any under strict mode, and FPDF lacks proper stubs — installing types-Flask alone does not clear the no-any-return errors.) api/pdf.py:161 — pdf.output(buf) fails strict mypy with call-overload once fpdf stubs are present (Will surface as soon as the FPDF misc subclass error is addressed; needs dest="S" / bytes return or an explicit cast.) Should fixIssue #100 AC gap — no Conversation model (models/conversation.py defines Composer, not Conversation) (Acceptance criteria explicitly list Bubble, Workspace, Conversation, SearchResult; Composer covers the domain but the named type is missing — add a Conversation alias/TypedDict or document the mapping in the issue close comment.) services/workspace_db.py:load_bubble_map vs services/search.py:_build_search_bubble_map — duplicate bubble KV loaders with different validation (G1/G3) (Search path validates via Bubble.from_dict and records ParseWarningCollector skips; load_bubble_map still stores raw dicts without schema checks — a drifted bubble can pass tabs/export but fail search, or vice versa.) services/search.py:552-553 — search_cli_sessions logs top-level failures but does not call parse_warnings.record_source_failure (search_global_storage records source_failure for operator-visible incomplete results; CLI search failures remain invisible in the API warnings payload — amplifies the silent-failure compound.) Three bubble shapes still not converged (models/conversation.py:Bubble raw-dict accessor vs utils/cli_chat_reader.py:236 messages_to_bubbles dict shape vs export flat keys) (Issue #100 implementation notes call for a single TypedDict or discriminated union; deferred work leaves shape drift uncaught at compile time for CLI/export paths.) Nice to havepyproject.toml dev extras — add types-Flask (and fpdf stubs if kept) alongside mypy (Improves local/CI parity and may reduce some stub friction once return casts are in place.) models/conversation.py:43-48 — Composer.createdAt rejects ISO strings (Intentional and documented; if Cursor migrates timestamp format, composers vanish from list/search with only drift warnings — consider relaxing when real data appears.) |
|
@CodeRabbit. Except my findings, Can you also review in more depth and find more additional bugs or issues? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
cat services/search.pyLength of output: 22085 🏁 Script executed: #!/bin/bash
cat models/search.py
cat models/parse_warnings.pyLength of output: 4683 🏁 Script executed: #!/bin/bash
cat utils/path_helpers.py
cat utils/exclusion_rules.pyLength of output: 13576 🏁 Script executed: #!/bin/bash
cat api/search.pyLength of output: 1921 🏁 Script executed: #!/bin/bash
cat services/workspace_db.pyLength of output: 16129 🏁 Script executed: #!/bin/bash
cat tests/test_search_helpers.pyLength of output: 21015 🏁 Script executed: #!/bin/bash
cat models/conversation.pyLength of output: 12233 🏁 Script executed: #!/bin/bash
cat utils/cli_chat_reader.pyLength of output: 18945
🔴 Must Fix1.
|
| Source | workspaceFolder value |
Example |
|---|---|---|
search_global_storage |
Display name leaf (from ws_id_to_name.get(ws_id)) |
"myrepo" |
search_legacy_workspaces |
Raw workspace.json folder value |
"file:///home/user/projects/myrepo" |
search_cli_sessions |
Full workspace path from CLI meta | "/home/user/projects/myrepo" |
The field is named workspaceFolder throughout the codebase and SearchResult declares it str | None, so callers expect a consistent path. A front-end consuming merged rank_results output would get three different shapes for the same semantic field. The test in test_search_helpers.py line ~354 even asserts the display name "myrepo" for the global-storage case, which would pass, but hides the inconsistency at the integration level.
Root cause: search_global_storage (line ~297 in services/search.py) sets:
"workspaceFolder": ws_name, # ← display name ("myrepo"), not pathwhereas search_legacy_workspaces sets:
"workspaceFolder": workspace_folder, # ← raw "file:///..." or "/home/..."Either normalise all three to a resolved filesystem path (same as get_workspace_folder_paths produces) or to a bare display name—but pick one contract and document it in SearchResult.
2. search_global_storage re-scans all per-workspace DBs on every request (no cache)
services/search.py line ~234:
composer_id_to_ws = build_composer_id_to_workspace_id(
workspace_path, workspace_entries
)build_composer_id_to_workspace_id_cached exists in services/workspace_db.py for exactly this purpose (fingerprint + disk cache). The uncached variant opens and reads every {workspace_id}/state.vscdb on each GET /api/search, which is O(number-of-workspaces) extra DB opens for every keystroke in typeahead search scenarios. Replace with:
composer_id_to_ws = build_composer_id_to_workspace_id_cached(
workspace_path, workspace_entries, rules
)3. traverse_blobs and iter_sessions disagree on CLI session meta validity
utils/cli_chat_reader.py:
iter_sessions→_read_metauses a plainjson.loads + cast, returning{}on any error.traverse_blobs→ internally callsCliSessionMeta.from_dict, raisingSchemaErroron missing/wrong-typed fields, and returns[]on failure.
In search_cli_sessions:
meta = session.get("meta", {}) # from _read_meta — may succeed
...
messages = traverse_blobs(session["db_path"]) # may return [] for same session
if not bubbles:
continue # session silently droppedIf CliSessionMeta.from_dict is stricter than _read_meta, a session can appear valid in iter_sessions (has name, createdAt, etc.) but be silently discarded by traverse_blobs. No warning is emitted. This should at minimum log a warning when traverse_blobs returns empty for a session whose meta decoded successfully—or _read_meta should use the same schema validation path.
🟡 Should Fix
4. search_global_storage serialises the entire raw composer dict into exclusion text
services/search.py line ~286:
metadata_parts=[
...
_json_dump_safe(cd), # ← entire raw Composer.raw dict
"\n".join(bubble_meta), # ← raw JSON of every bubble
],cd is composer.raw which already contains conversationSummary, usage, requestMetadata, etc. Those are also serialised individually just above it. For a 1000-bubble conversation, this double-serialises every bubble's raw JSON into a single exclusion-rule string that can easily exceed several megabytes. This runs on every composer row on every search request. The _json_dump_safe(cd) entry should be removed since the individual parts already cover all the relevant metadata fields.
5. search_legacy_workspaces per-workspace exceptions lose the stack trace
services/search.py line ~357:
except Exception as exc:
_logger.warning("Failed to search legacy workspace %s: %s", name, exc)Missing exc_info=True (or _logger.exception(...)). Contrast with search_global_storage which correctly uses _logger.exception("Error searching global storage"). Without the stack trace, diagnosing workspace-specific parse failures is very difficult in production.
6. _extract_snippet produces a phantom snippet when query is the empty string
services/search.py:
idx = text.lower().find(query_lower) # find("") always returns 0str.find("") returns 0 for any non-empty string. With query = "", _extract_snippet would return the first ~200 chars of every bubble text as a "match". The /api/search handler already rejects blank queries with a 400, so the current blast radius is zero—but _extract_snippet is a public-ish helper (imported by tests) and this edge case is silent. Add a guard at the top:
if not query_lower:
return ""7. rank_results _ts silently accepts bool as a valid timestamp
services/search.py:
if isinstance(t, bool) or not isinstance(t, (int, float)):
return 0.0This correctly rejects booleans. But the upstream to_epoch_ms in utils/path_helpers.py does not guard booleans before the isinstance(value, (int, float)) check:
if isinstance(value, (int, float)): # bool is a subclass of int
if value > 1e12:
return int(value)
if value > 0:
return int(value * 1000) # True → 1000 msto_epoch_ms(True) silently returns 1000 (1 second since epoch). The guard that exists in Composer._optional_counter and Bubble.thinking_duration_ms (check isinstance(value, bool) first) should be replicated in to_epoch_ms.
8. search_legacy_workspaces timestamp falsy-coalescing drops 0 values correctly but the type contract is not enforced
"timestamp": tab.get("lastSendTime") or _UNKNOWN_SEARCH_TIMESTAMP,If lastSendTime is a non-zero float (e.g. 1715000000.5), rank_results._ts receives a float but SearchResult.timestamp is typed int | str. Mypy won't catch this at the dict-literal level because the value comes from tab.get(...) which returns Any. The timestamp field in SearchResult should be widened to int | float | str, or to_epoch_ms should be applied here just as in the other two search functions.
🔵 Nice to Have
9. test_search_helpers.py imports private helpers (_extract_snippet, _find_match)
These are single-underscore-prefixed functions. Testing them via direct import creates tight coupling to implementation details. They should either be made public (renamed without leading underscore) if they are part of the module's tested surface, or the tests should be updated to verify observable behaviour through the public functions (search_global_storage, etc.).
10. ParseWarningCollector has no coverage path for legacy or CLI per-item parse failures
search_global_storage records record_bubble_skipped / record_composer_skipped / record_composer_processing_failure. search_legacy_workspaces and search_cli_sessions do not accept a ParseWarningCollector parameter at all. Malformed legacy chat JSON or CLI blob parse errors are logged server-side but are never surfaced to the API caller via the warnings field. This is a smaller version of the "CLI source failure" gap already in your list.
(ᐧωᐧ)🐇
|
Verified each CodeRabbit finding against current code. Summary: 🔴 Must fix
🟡 Should fix
🔵 Nice to have — skipped
Validation |
2ad9d31 to
cc24f11
Compare
c661ce2 to
cc24f11
Compare
|
services/workspace_db.py:load_bubble_map vs services/search.py:_build_search_bubble_map — duplicate bubble KV loaders with different validation (G1/G3) (Search path validates via Bubble.from_dict and records ParseWarningCollector skips; load_bubble_map still stores raw dicts without schema checks — a drifted bubble can pass tabs/export but fail search, or vice versa.) Three bubble shapes still not converged (models/conversation.py:Bubble raw-dict accessor vs utils/cli_chat_reader.py:236 messages_to_bubbles dict shape vs export flat keys) (Issue #100 implementation notes call for a single TypedDict or discriminated union; deferred work leaves shape drift uncaught at compile time for CLI/export paths.) not addressed yet |
Route all bubbleId:* loads through workspace_db._parse_bubble_kv_row and load_bubble_map (including workspace tabs). Introduce DisplayBubble/BubbleMetadata and utils/display_bubble builders shared by tabs, CLI, and IDE markdown export. Co-authored-by: Cursor <[email protected]>
|
@coderabbitai check @clean6378-max-it 's findings are fully addressed now. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== services/workspace_db.py: load_bubble_map ==="
rg -n "load_bubble_map" services/workspace_db.py -A 20
echo ""
echo "=== services/search.py: _build_search_bubble_map or bubble loading ==="
rg -n "_build_search_bubble_map\|bubble_map\|load_bubble_map" services/search.py -B2 -A5
echo ""
echo "=== Bubble shapes: models/conversation.py Bubble class ==="
rg -n "class Bubble" models/conversation.py -A 30
echo ""
echo "=== utils/cli_chat_reader.py messages_to_bubbles ==="
rg -n "def messages_to_bubbles" utils/cli_chat_reader.py -A 30
echo ""
echo "=== Conversation model/alias in models/ ==="
rg -rn "Conversation" models/ -B1 -A2
echo ""
echo "=== search_cli_sessions: record_source_failure ==="
rg -n "record_source_failure\|parse_warnings" services/search.py -B2 -A3
echo ""
echo "=== mypy strict errors: check api/search.py, api/workspaces.py, etc. ==="
rg -n "def search\|def list_workspaces\|def get_workspace\|def list_composers\|def detect_environment\|def get_logs\|def export_chats\|def generate_pdf" api/search.py api/workspaces.py api/composers.py api/config_api.py api/logs.py api/export_api.py api/pdf.py 2>/dev/null
echo ""
echo "=== api/pdf.py: output call and return type ==="
rg -n "output\|return\|cast" api/pdf.py -B1 -A2
echo ""
echo "=== ParseWarningCollector: record_source_failure usage in search.py ==="
grep -n "record_source_failure" services/search.pyLength of output: 9230 🏁 Script executed: #!/bin/bash
echo "=== api/pdf.py full content ==="
cat api/pdf.py
echo ""
echo "=== models/__init__.py full content ==="
cat models/__init__.py
echo ""
echo "=== models/search.py full content ==="
cat models/search.py
echo ""
echo "=== models/conversation.py: Conversation alias lines ==="
grep -n "^n\b\|^Conversation\b\|# is the\|^n =\|ConversationSummary\|ConvSummary" models/conversation.py
echo ""
echo "=== services/search.py: bubble loading section (lines 200-250) ==="
sed -n '200,260p' services/search.py
echo ""
echo "=== services/search.py: CLI section around line 537 ==="
sed -n '520,560p' services/search.pyLength of output: 14132
✅ Must Fix
✅ Should Fix
Additionally, the |
Summary
strict = trueinpyproject.tomland update CI to runmypy .from project configSearchResultandConversationSummary, and wireSearchResultthrough the search service and/api/searchapi/route handlers, allutils/modules, and allservices/modulesPer-module mypy overrides remain for
scripts/export.pyandtests.*until those surfaces are fully annotated.Closes #100.
Acceptance criteria
mypyconfigured with[tool.mypy]andstrict = trueBubble,Workspace,Composerdataclasses;SearchResult,ConversationSummaryTypedDicts)utils/functions fully annotatedmypypasses with zero errors on production modulesTest plan
mypy .— zero errors (86 source files)pytest— 447 passed, 4 skippedSummary by CodeRabbit
Refactor
Chores
Tests
Documentation