perf: reduce memory baseline for busybox and sensor by davdhacs · Pull Request #19999 · stackrox/stackrox

davdhacs · 2026-04-14T21:31:27Z

Description

Umbrella PR for the memory baseline optimization series. This tracks the overall effort — individual changes are in separate PRs for focused review.

Merged

perf: combine logger zap sampling #19997 — Shared zap sampler (-46 MB sensor, -170 MB central). Replaces per-logger sampling with a single shared sampler core. Same flood protection, 1/100th the memory.

Open — Schema Lazy Loading

perf: reduce busybox init-time memory allocation #20024 — Schema lazy loading via sync.OnceValue (-20% busybox init heap, -29% mallocs). Defers schema construction to first access. 340+ files but mechanical — template change + regenerated output + caller updates.

Open — Init-Time Reductions

perf: reduce init-time memory via lazy loading and build tags #20047 — Lazy loading and build tags for booleanpolicy, branding, cloudproviders, probeupload, printers, fake workloads. Depends on perf: reduce busybox init-time memory allocation #20024.

Open — Logging Improvements

perf: share single lumberjack writer across loggers #20018 — Shared lumberjack writer (-29 goroutines)
feat: add ROX_LOGGING_TO_FILE to disable file logging #20019 — ROX_LOGGING_TO_FILE env var (disable file logging in containers)

Not Yet Created

Process enricher cache scaling with ROX_MEMLIMIT
Additional PRs from the full roadmap (13 PRs total)

Combined Measurements (live GKE cluster)

Component	Master	After zap fix (#19997)	Delta
Central	267 Mi	231 Mi	-36 Mi
Sensor	142 Mi	123 Mi	-19 Mi
Admission-control	40 Mi	36 Mi	-4 Mi
Total	449 Mi	390 Mi	-59 Mi

Note: These measurements are from the zap sampler change only. Additional savings expected from schema lazy loading and init-time reductions once deployed.

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready
CI results are inspected

How I validated my change

This is a tracking PR. Individual PRs have their own validation.

AI-assisted.

openshift-ci · 2026-04-14T21:31:32Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

sourcery-ai

Sorry @github-actions[bot], your pull request is larger than the review limit of 150000 diff characters

github-actions · 2026-04-14T21:45:24Z

🚀 Build Images Ready

Images are ready for commit 10e59b0. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-661-g10e59b0d79

Reduce init-time memory for the busybox binary by eliminating unnecessary imports, deferring allocations with sync.OnceValue, and breaking heavy transitive dependency chains. Results (Linux amd64): - Busybox: 16.1 MB -> 12.9 MB heap (-20%), 245K -> 173K mallocs (-29%) - AC standalone: 9.1 MB -> 7.2 MB heap (-21%), 87K -> 51K mallocs (-41%) - Binary size: 205 MB -> 194 MB (-5%) Generated with assistance from AI Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Each logger that writes to a file spawns a lumberjack goroutine for log rotation. With ~30 loggers writing to /var/log/stackrox/log.txt, that's 30 idle goroutines + 30 independent file handles to the same file. In container environments, logs go to stdout and are collected by the container runtime — file logging is unnecessary overhead. Set ROX_LOGGING_TO_FILE=false to disable file logging, saving: - 30 goroutines and their stacks - File I/O overhead - lumberjack rotation processing Default is true (unchanged behavior) for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Each CreateLogger call created an independent lumberjack.Logger for the same log file, spawning its own rotation goroutine. With ~30 loggers, that's 30 goroutines + 30 file handles to the same file. Share a single writer per path via a map. This reduces log rotation goroutines from 30 to 1 and eliminates potential corruption from concurrent uncoordinated writes to the same file. GC sweet spot experiment findings (included in commit message for context): - 128Mi: GC thrashing (84 GC/min, 200m CPU) - 160Mi: Sweet spot (2 GC/min, 4m CPU) - 192Mi: Comfortable (0 GC/min, 3m CPU) - Rule: set limit to 1.3-1.5x natural heap size Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Process enrichment LRU cache was hardcoded at 100K entries — designed for large enterprise clusters with thousands of containers. On a 50-container edge cluster, this is 2000x oversized. Use pkg/sensor/queue.ScaleSize to scale based on ROX_MEMLIMIT: - 128Mi limit → ~3K entries (sufficient for 50 containers) - 4Gi limit → 100K entries (unchanged behavior) - Minimum: 100 entries Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

davdhacs added the ai-assisted label Apr 14, 2026

openshift-ci bot added the do-not-merge/work-in-progress label Apr 14, 2026

github-actions bot added area/roxctl area/central area/helm area/sensor area/postgres ai-review coderabbit-review labels Apr 14, 2026

sourcery-ai bot reviewed Apr 14, 2026

View reviewed changes

davdhacs and others added 4 commits April 15, 2026 06:53

davdhacs force-pushed the davdhacs/pr1-memory-baseline branch from 696081a to 10e59b0 Compare April 15, 2026 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce memory baseline for busybox and sensor#19999

perf: reduce memory baseline for busybox and sensor#19999
davdhacs wants to merge 4 commits intomasterfrom
davdhacs/pr1-memory-baseline

davdhacs commented Apr 14, 2026 •

edited

Loading

Uh oh!

openshift-ci bot commented Apr 14, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

github-actions bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davdhacs commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Merged

Open — Schema Lazy Loading

Open — Init-Time Reductions

Open — Logging Improvements

Not Yet Created

Combined Measurements (live GKE cluster)

User-facing documentation

Testing and quality

How I validated my change

Uh oh!

openshift-ci bot commented Apr 14, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Build Images Ready

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davdhacs commented Apr 14, 2026 •

edited

Loading

github-actions bot commented Apr 14, 2026 •

edited

Loading