Report HTML document-loader failures clearly#913
Conversation
Reject HTML and XHTML document-loader responses that do not advertise an ActivityPub alternate document before they reach the JSON parser. This surfaces remote HTML pages as FetchError failures with response context instead of generic SyntaxError crashes. Add regression coverage for HTML responses without ActivityPub alternates and keep the ReDoS regression test aligned with the new error path. Fixes fedify-dev#912 Assisted-by: Codex:gpt-5.5
✅ Deploy Preview for fedify-json-schema canceled.
|
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request modifies getDocumentLoader() in @fedify/vocab-runtime to throw a FetchError when HTML/XHTML responses are too large or lack an ActivityPub alternate link, rather than attempting to parse the HTML as JSON. This improves error handling by surfacing document loading failures with descriptive messages instead of generic JSON parser crashes. Tests have been updated to assert this new behavior. There are no review comments, so I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c6750f718
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthrough
ChangesHTML document loader error handling
Estimated code review effort: 3 (Moderate) | ~25 minutes Suggested reviewers: 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/vocab-runtime/src/docloader.ts`:
- Around line 205-217: The HTML size guard in docloader.ts is applied too late
because alternate-link scanning in the response handling path still buffers the
full body via response.text() after cloning the stream. Update the document
fetch logic in the relevant docloader flow (around the same response-processing
code used for the FetchError path) so MAX_HTML_SIZE is enforced before
buffering, either by short-circuiting on Content-Length or by streaming/reading
incrementally with an early stop, and keep the existing logger.warn and
FetchError behavior intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9dc5289d-6a6e-423f-a8da-b8aae6a2bb84
📒 Files selected for processing (3)
CHANGES.mdpackages/vocab-runtime/src/docloader.test.tspackages/vocab-runtime/src/docloader.ts
Codecov Report❌ Patch coverage is
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
Keep the HTML document-loader fallback narrow by parsing responses that look like JSON even when they are served with an HTML Content-Type. This preserves compatibility with misconfigured ActivityPub endpoints while still reporting real HTML pages without alternates as FetchError failures. Read HTML responses through a bounded stream instead of buffering the full body before checking the scan limit. Oversized responses now short-circuit from Content-Length when available, or while reading when the stream crosses the limit. fedify-dev#913 (comment) fedify-dev#913 (comment) Assisted-by: Codex:gpt-5.5
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b00ea874a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request updates getDocumentLoader() to reject HTML and XHTML responses that do not advertise an ActivityPub alternate document with a FetchError instead of attempting to parse them as JSON. It introduces a helper function readBoundedText to safely read response bodies up to a 1MB limit to prevent ReDoS attacks and excessive memory usage. The feedback suggests adding a defensive fallback to response.text() in readBoundedText to ensure compatibility with environments or test mocks where response.body.getReader is not available.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Cancel oversized response bodies when Content-Length already exceeds the HTML scan limit, so document-loader failures do not leave the fetch stream open while returning early. Fall back to response.text() when a custom or non-standard response body lacks getReader(). This keeps the bounded reader usable with fetch mocks and older runtimes while preserving the streaming path where available. fedify-dev#913 (comment) fedify-dev#913 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates @fedify/vocab-runtime to reject HTML and XHTML responses that do not advertise an ActivityPub alternate document with a FetchError instead of attempting to parse the HTML as JSON. It introduces a MAX_HTML_SIZE limit of 1MB and helper functions like readBoundedText to safely stream and read HTML responses up to this limit, preventing JSON parser crashes on remote HTML error pages. Additionally, corresponding unit tests have been added to verify these behaviors, including handling oversized HTML and fallback scenarios. There are no review comments, and I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. Breezy! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
The bounded HTML path can rely on the standard Response body type instead of defensive assertions for non-standard mocks. The parser now preserves HTML-labelled JSON compatibility by attempting JSON.parse() directly and converts syntax failures into the same FetchError used for ordinary HTML. fedify-dev#913 (comment) fedify-dev#913 (comment) fedify-dev#913 (comment) fedify-dev#913 (comment) fedify-dev#913 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Code Review
This pull request updates @fedify/vocab-runtime to reject HTML and XHTML responses that do not advertise an ActivityPub alternate document with a FetchError instead of attempting to parse the HTML as JSON. It introduces a MAX_HTML_SIZE limit of 1MB and implements bounded text reading (readBoundedText) to safely cancel response bodies that exceed this limit, preventing potential ReDoS or memory issues. Additionally, unit tests have been updated and expanded to cover these new error conditions and body cancellation behaviors. There are no review comments provided, so I have no feedback to address.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Fedify already treats HTML as a possible discovery page, not as an immediate failure. That is useful for servers that expose an ActivityPub representation through an alternate link, so this PR keeps that behavior. The problem is the fallback after discovery fails: once the loader has seen an HTML/XHTML response and found no usable ActivityPub alternate, passing the same body to the JSON parser turns a remote HTML page into a generic
SyntaxError.This moves the failure to the point where the document loader still has the right context. In packages/vocab-runtime/src/docloader.ts, the HTML/XHTML branch now clones the response before reading the body, keeps the existing header and in-document alternate discovery, then throws
FetchErrorif no distinct ActivityPub or JSON-LD alternate can be found. Oversized HTML follows the same typed error path because Fedify cannot scan it safely. In both cases the error keeps the document URL, the response, and the Content-Type in the message.The tests in packages/vocab-runtime/src/docloader.test.ts cover a real HTML page with no ActivityPub alternate and assert that the error is a
FetchErrorwith the original response metadata. The ReDoS regression test still checks that hostile HTML completes quickly, but now expects the same document-loader error instead of relying on the JSON parser. I also narrowed the malformed Link-header test to a JSON-LD response, so it continues to cover Link parser recovery without depending on the old HTML fallback.CHANGES.md records this under
@fedify/vocab-runtimefor the next minor release.Fixes #912.