Skip to content

fix: disable debug=True / Werkzeug debugger exposure (closes #9)#10

Closed
timon0305 wants to merge 2 commits into
cppalliance:masterfrom
timon0305:fix/disable-debug-werkzeug-9
Closed

fix: disable debug=True / Werkzeug debugger exposure (closes #9)#10
timon0305 wants to merge 2 commits into
cppalliance:masterfrom
timon0305:fix/disable-debug-werkzeug-9

Conversation

@timon0305

@timon0305 timon0305 commented May 4, 2026

Copy link
Copy Markdown
Collaborator

Problem

app.py was hard-coding debug=True in its app.run(...) call. The Werkzeug debugger that ships with Flask debug mode is a documented remote-code-execution primitive — anyone reaching the listening port (and guessing the PIN, which is generated from machine-stable inputs) can execute arbitrary Python in the server process. The default 127.0.0.1 bind is the only thing standing between the operator and RCE; a misconfigured --host, an SSH tunnel, or a careless reverse proxy is enough to expose it.

There was also no way to opt out — the literal was unconditional.

Change

Three files:

  • app.py — removed debug=True. Default debug now off. Opt-in via either --debug CLI flag or FLASK_DEBUG=1 env var. When debug is enabled, prints a stderr WARNING naming the RCE risk + reminding the operator to bind only to loopback. Auto-reloader gated on the same flag.
  • utils/debug_flag.py — new tiny module with resolve_debug_flag(env_value, cli_flag). Lives outside app.py so it can be unit-tested without importing Flask (matching the existing _build_app_parser mirror convention in the test suite). Truthy env values: "1", "true", "yes" — case-insensitive, whitespace-tolerant. CLI flag wins.
  • tests/test_cli_args.py — new TestDebugFlagGating class: 8 cases covering env-truthy / env-falsey / env-unset / CLI-override + argparse --debug default & explicit + a source-level guard that fails if debug=True is ever re-introduced as a literal in app.py.

Test plan

Unit: python3 -m unittest tests.test_cli_args42/42 OK (was 34, +8 new).

Live smoke (all four matrix cells, on a real flask>=3.0 install):

Trial Setting Debug mode WARNING Debugger is active!
A (default — no env, no flag) off
B --debug on
C FLASK_DEBUG=1 on
D FLASK_DEBUG=0 off

Bogus paths in trials A/D return a plain Flask 404 — not the orange Werkzeug debugger console. The WARNING prints once per process (parent + reloader child = 2× when the reloader is on, which is correct).

Severity

Critical — Werkzeug debugger exposure is a documented RCE pathway. Listed as a Critical / 1pt item in the eval week-1 plan for cppa-cursor-browser.

Closes #9.

Summary by CodeRabbit

  • New Features

    • Added a --debug CLI flag to opt in to debug mode
    • Displays a security warning when debug mode is enabled
  • Improvements

    • Debug mode now defaults to off instead of always on
    • Auto-reloader behavior respects platform-specific requirements
  • Tests

    • Added tests covering debug-flag resolution across environment values and CLI overrides

…ppalliance#9)

The Werkzeug debugger is a documented remote-code-execution primitive.
app.py was hard-coding `debug=True`, which exposed RCE to anyone who
could reach the listening port — a misconfigured `--host`, an SSH
tunnel, or a careless reverse proxy was enough.

- Remove the `debug=True` literal from app.py.
- Default debug OFF. Opt-in via either `--debug` CLI flag or
  `FLASK_DEBUG=1` env var (truthy = "1" / "true" / "yes",
  case-insensitive, whitespace-tolerant).
- Print a stderr WARNING when debug is enabled, naming the RCE risk
  and reminding the operator to bind only to loopback.
- Gate the auto-reloader on the same flag.

Live-tested all four matrix cells: (default off / --debug / FLASK_DEBUG=1
/ FLASK_DEBUG=0). Bogus paths under debug-off return a plain Flask 404,
not the Werkzeug debugger console.

Helper `resolve_debug_flag(env_value, cli_flag)` lives in
`utils/debug_flag.py` so it can be unit-tested without importing Flask
(matching the existing test convention in tests/test_cli_args.py).

Regression coverage in tests/test_cli_args.py adds 8 cases:
- default-off, env-truthy, env-falsey, CLI override
- argparse `--debug` default + explicit
- source-level guard that fails if `debug=True` is reintroduced
@coderabbitai

coderabbitai Bot commented May 4, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 48e40c46-12c3-4cee-8e25-f550d09f8e12

📥 Commits

Reviewing files that changed from the base of the PR and between 26258c0 and c2b4a21.

📒 Files selected for processing (1)
  • tests/test_cli_args.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_cli_args.py

📝 Walkthrough

Walkthrough

The PR makes Flask debug opt-in by adding a CLI --debug flag and a resolve_debug_flag utility that combines FLASK_DEBUG and the CLI flag. The app warns to stderr when debug is enabled and passes debug_enabled and a gated use_reloader into app.run. Tests verify behavior and forbid a literal debug=True.

Changes

Debug Mode Opt-in and Safety

Layer / File(s) Summary
Data / Decision Shape
utils/debug_flag.py
New resolve_debug_flag(env_value, cli_flag) -> bool: CLI forces true; env_value is None -> false; otherwise truthy strings ("1", "true", "yes") enable debug.
CLI Surface
app.py
Adds --debug argparse boolean (action="store_true") to the app's CLI.
Integration / Wiring
app.py
Imports resolve_debug_flag, computes debug_enabled = resolve_debug_flag(os.environ.get("FLASK_DEBUG"), args.debug), prints a stderr RCE warning when enabled.
Runtime Behavior
app.py
Replaces literal debug=True with debug=debug_enabled and use_reloader=debug_enabled and (sys.platform != "win32") in app.run(...).
Tests / Regression
tests/test_cli_args.py
Adds TestDebugFlagGating and updates _build_app_parser() to include --debug. Tests cover env unset/empty/falsey/truthy, CLI precedence, parser defaults when --debug absent/present, and AST check forbidding literal debug=True in app.py.
sequenceDiagram
    participant User
    participant CLI_Parser
    participant ResolveDebug
    participant App_Run
    User->>CLI_Parser: run app.py [--debug]
    CLI_Parser->>ResolveDebug: env_value=FLASK_DEBUG, cli_flag=args.debug
    ResolveDebug->>ResolveDebug: apply opt-in logic (CLI OR truthy env)
    ResolveDebug->>App_Run: return debug_enabled (bool)
    alt debug_enabled == True
        App_Run->>User: write RCE warning to stderr (print)
    end
    App_Run->>App_Run: set debug=debug_enabled<br/>set use_reloader accordingly
    App_Run->>User: start Flask server
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble flags and nibble code,

FLASK_DEBUG whispers on the road.
--debug hops in, but only by choice,
I warn of sparks with a cautious voice.
Safe starts, small drumbeat — a rabbit's rejoice.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately describes the main change: disabling the hardcoded debug=True that exposed the Werkzeug debugger vulnerability, and it directly references the linked issue #9.
Linked Issues check ✅ Passed All core requirements from issue #9 are implemented: debug=True removed, default disabled, --debug flag added, FLASK_DEBUG env var support, stderr warning on debug enabled, auto-reloader gated, regression tests included.
Out of Scope Changes check ✅ Passed All changes directly address the security vulnerability in issue #9. The PR adds necessary files (debug_flag.py), updates app.py to use opt-in debug, and adds comprehensive test coverage—all aligned with objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_cli_args.py (1)

37-47: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exercise the real app parser instead of a test-local clone.

Lines 289-295 currently prove the behavior of _build_app_parser(), not the parser that app.py actually uses. That means a future regression in the real --debug definition can stay green here as long as this copy is not changed in lockstep. Please move parser construction into a Flask-free helper and import that from both app.py and this test so the security-sensitive flag is covered by one source of truth.

Also applies to: 289-295

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cli_args.py` around lines 37 - 47, Extract the argparse
construction currently duplicated in _build_app_parser into a shared, Flask-free
helper function (e.g., create_app_parser) in a module that neither imports Flask
nor app runtime; update app.py to import and use create_app_parser instead of
building its own parser, and change tests/test_cli_args.py to import and call
create_app_parser (removing the local _build_app_parser) so the real parser
(including the --debug flag) is exercised from one source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_cli_args.py`:
- Around line 300-304: Replace the brittle substring check in
test_app_py_does_not_hardcode_debug_true with an AST-based check: parse the file
contents (src) with ast.parse, walk all ast.Call nodes and inspect their
.keywords for any keyword with arg == "debug" whose value is a literal True
(handle both ast.Constant(value=True) and older ast.NameConstant/ast.Name
patterns), and assert that no such call exists; keep the function name
test_app_py_does_not_hardcode_debug_true and fail the test if any debug=True
literal is found in a call site.

---

Outside diff comments:
In `@tests/test_cli_args.py`:
- Around line 37-47: Extract the argparse construction currently duplicated in
_build_app_parser into a shared, Flask-free helper function (e.g.,
create_app_parser) in a module that neither imports Flask nor app runtime;
update app.py to import and use create_app_parser instead of building its own
parser, and change tests/test_cli_args.py to import and call create_app_parser
(removing the local _build_app_parser) so the real parser (including the --debug
flag) is exercised from one source of truth.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e397f53-d775-411f-befc-0b9ce51f68a7

📥 Commits

Reviewing files that changed from the base of the PR and between f8b3cb3 and 26258c0.

📒 Files selected for processing (3)
  • app.py
  • tests/test_cli_args.py
  • utils/debug_flag.py

Comment thread tests/test_cli_args.py Outdated
…lliance#10)

Old guard: `self.assertNotIn("debug=True", src)` — substring match.
That misses cosmetic variants like `debug = True` (with spaces),
multi-line `debug=\n    True`, or any other form that produces the
same runtime semantics. CodeRabbit correctly flagged it as evadable.

Replaced with an `ast.walk(tree)` over the parsed app.py: find any
`ast.Call` whose keywords contain `debug=True` as a literal Constant.
Catches every cosmetic variant by definition.

Failure message includes the offending line number(s) and the
rationale (issue cppalliance#9), so a future CI break is immediately
debuggable.

Verified by injecting `debug = True` (with spaces — the form the
old check missed) into app.py:
  - Old check: would have passed (false negative).
  - New check: failed with `[136]` and the issue-cppalliance#9 message.
Then reverted the inject; test passes again.

42/42 tests still pass on the actual app.py.
@timon0305 timon0305 self-assigned this May 7, 2026
@timon0305

Copy link
Copy Markdown
Collaborator Author

Closed in favour of the in-repo equivalent so CI can run.

@timon0305 timon0305 closed this May 7, 2026
wpak-ai pushed a commit that referenced this pull request May 8, 2026
* fix: gate Flask debug / Werkzeug debugger behind opt-in flag (closes #9)

The Werkzeug debugger is a documented remote-code-execution primitive.
app.py was hard-coding `debug=True`, which exposed RCE to anyone who
could reach the listening port — a misconfigured `--host`, an SSH
tunnel, or a careless reverse proxy was enough.

- Remove the `debug=True` literal from app.py.
- Default debug OFF. Opt-in via either `--debug` CLI flag or
  `FLASK_DEBUG=1` env var (truthy = "1" / "true" / "yes",
  case-insensitive, whitespace-tolerant).
- Print a stderr WARNING when debug is enabled, naming the RCE risk
  and reminding the operator to bind only to loopback.
- Gate the auto-reloader on the same flag.

Live-tested all four matrix cells: (default off / --debug / FLASK_DEBUG=1
/ FLASK_DEBUG=0). Bogus paths under debug-off return a plain Flask 404,
not the Werkzeug debugger console.

Helper `resolve_debug_flag(env_value, cli_flag)` lives in
`utils/debug_flag.py` so it can be unit-tested without importing Flask
(matching the existing test convention in tests/test_cli_args.py).

Regression coverage in tests/test_cli_args.py adds 8 cases:
- default-off, env-truthy, env-falsey, CLI override
- argparse `--debug` default + explicit
- source-level guard that fails if `debug=True` is reintroduced

* test: AST-walk the debug=True regression guard (CodeRabbit on PR #10)

Old guard: `self.assertNotIn("debug=True", src)` — substring match.
That misses cosmetic variants like `debug = True` (with spaces),
multi-line `debug=\n    True`, or any other form that produces the
same runtime semantics. CodeRabbit correctly flagged it as evadable.

Replaced with an `ast.walk(tree)` over the parsed app.py: find any
`ast.Call` whose keywords contain `debug=True` as a literal Constant.
Catches every cosmetic variant by definition.

Failure message includes the offending line number(s) and the
rationale (issue #9), so a future CI break is immediately
debuggable.

Verified by injecting `debug = True` (with spaces — the form the
old check missed) into app.py:
  - Old check: would have passed (false negative).
  - New check: failed with `[136]` and the issue-#9 message.
Then reverted the inject; test passes again.

42/42 tests still pass on the actual app.py.

* review: address PR #20 nits — broaden debug=True guard + FLASK_DEBUG note

- AST guard now handles ast.NameConstant (Py3.7) and **{"debug":True}
  dict-spread bypass; helper extracted for unit testing.
- README: opt-in note for the Werkzeug debugger, including that
  FLASK_ENV=development is NOT consulted (only FLASK_DEBUG=1).
- Replace em dashes in app.py comments with ASCII to silence GitHub's
  non-ASCII banner on review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: disable debug=True / Werkzeug debugger exposure in app.py

1 participant