Skip to content

ROX-29577: VM4VM e2e tests#20042

Draft
vikin91 wants to merge 13 commits intomasterfrom
piotr-ROX-29577-VM4VM-e2e-tests
Draft

ROX-29577: VM4VM e2e tests#20042
vikin91 wants to merge 13 commits intomasterfrom
piotr-ROX-29577-VM4VM-e2e-tests

Conversation

@vikin91
Copy link
Copy Markdown
Contributor

@vikin91 vikin91 commented Apr 16, 2026

Description

User request (feature): "Design the end-to-end tests for the VM scanning feature."

Add a Go test_e2e VM-scanning suite that provisions RHEL 9 and RHEL 10 KubeVirt VMs, installs and runs roxagent, validates Central scan visibility, and captures progressive diagnostics for flaky-path triage.
Wire a dedicated OpenShift CI job and e2e scripts with CNV/VSOCK prerequisite checks and activation/repo2cpe guardrails.

AI-assisted: this commit was partially generated with AI assistance.
Made-with: Cursor

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 16, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@vikin91 vikin91 changed the title Piotr rox 29577 vm4 vm e2e tests ROX-29577: VM4VM e2e tests Apr 16, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @github-actions[bot], your pull request is larger than the review limit of 150000 diff characters

@vikin91 vikin91 mentioned this pull request Apr 16, 2026
9 tasks
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added comprehensive end-to-end VM scanning test framework for KubeVirt virtual machines on OpenShift clusters.
    • Integrated CNV (OpenShift Virtualization) operator installation and configuration into CI pipeline.
    • Added VM lifecycle management and guest preparation helpers (SSH, cloud-init, sudo verification, roxagent deployment).
  • Documentation

    • Added detailed design specification and implementation plan for VM scanning e2e testing.
  • Chores

    • Extended CI job configuration for VM scanning test execution on OpenShift.
    • Added Makefile targets and shell scripts to orchestrate VM scanning tests.

Walkthrough

This pull request introduces a comprehensive end-to-end test infrastructure for VM scanning on OpenShift clusters. It includes CI pipeline updates, Kubernetes CNV operator setup, a full Go-based E2E test suite with supporting helpers, Prometheus observability utilities, and shell scripts for test orchestration and custom VM image building.

Changes

Cohort / File(s) Summary
CI/OpenShift Configuration
.openshift-ci/begin.sh, .openshift-ci/ci_tests.py, .openshift-ci/dispatch.sh, scripts/ci/jobs/ocp_vm_scanning_e2e_tests.py
Added conditional WORKER_NODE_TYPE assignment, new VMScanningE2e test class, extended job pattern matching, and a new OpenShift CI runner with environment variable configuration for CNV/KubeVirt and VM image setup.
Documentation & Planning
docs/superpowers/plans/2026-03-27-vm-scanning-e2e.md, docs/superpowers/specs/2026-03-26-vm-scanning-e2e-design.md
Added comprehensive E2E test plan describing implementation blueprint including helper suites, guest preparation, and observability/escalation diagnostics; added detailed design spec defining suite architecture, scenarios, CI lane shape, and parameterization behavior.
Test Orchestration & Bootstrap
test.sh, tests/Makefile, tests/e2e/run-vm-scanning.sh, tests/e2e/vm-scanning-lib.sh
Added bootstrap script exporting deployment/runtime configuration, new Make targets for VM scanning unit/E2E tests, test orchestration script, and preflight helper functions for cluster prerequisites (KUBECONFIG, VM images, KVM availability, virtctl, roxagent).
CNV Operator Installation & Kubernetes Manifests
tests/e2e/lib.sh, tests/e2e/yaml/cnv-operator/*
Extended E2E library with install_the_cnv_operator() function for CNV/KubeVirt setup with VSOCK feature gate configuration; added namespace, operator-group, subscription, and hyperconverged custom resource manifests.
Core E2E Test Suite
tests/vm_scanning_suite_test.go, tests/vm_scanning_test.go, tests/vm_scanning_e2e_utils_test.go
Implemented main test suite orchestrating VM provisioning, guest preparation, roxagent invocation, and Central/Sensor validation; added test methods for scan pipeline validation with rescan verification; added configuration loading and VSOCK/cluster readiness utilities.
VM Helpers Package: Core Functionality
tests/vmhelpers/vm.go, tests/vmhelpers/central.go, tests/vmhelpers/roxagent.go, tests/vmhelpers/virtctl.go, tests/vmhelpers/wait.go
Implemented KubeVirt VM lifecycle operations (creation, status polling, deletion), Central API polling for VM/scan readiness with multiple condition variants, roxagent binary deployment and execution with fallback URL handling, virtctl SSH/SCP wrapper with configurable timeouts/logging, and shared polling utilities with authentication expiration detection.
VM Helpers Package: Guest & SSH Operations
tests/vmhelpers/guest.go, tests/vmhelpers/guest_ssh_internal.go, tests/vmhelpers/guest_ssh_policy.go, tests/vmhelpers/virtctl_ssh.go, tests/vmhelpers/virtctl_process_*.go
Implemented guest preparation (SSH reachability, cloud-init completion, sudo verification, RHEL activation), SSH failure classification with terminal/retryable distinction, SSH reachability policy with configurable thresholds, SSH command construction/execution, and platform-specific process cancellation (Unix/non-Unix).
VM Helpers Package: Observability & Tests
tests/vmhelpers/observability.go, tests/vmhelpers/*_test.go
Added Prometheus metric constants for compliance relay and sensor VM index; included comprehensive unit tests for VM helpers covering activation status, SSH error handling, DNF/RHSM operations, roxagent behavior, virtctl command construction, cloud-init rendering, VM lifecycle, polling logic, and metric verification.
Test Metrics Package
tests/testmetrics/parse.go, tests/testmetrics/scrape.go, tests/testdata/vm-scanning/*, tests/vm_scanning_e2e_utils_preflight_test.go
Implemented Prometheus exposition text parsing with stability polling, Kubernetes-backed metrics scraping (via proxy/port-forward), component aggregation with per-pod snapshots, service port discovery, cloud-init template for VM guest configuration, and vmscanning embedded asset package.
VM Image Building
vm-image.sh
Added Bash script for building customized RHEL-based ContainerDisk images with subscription registration, package installation, and qcow2 conversion via podman/loop-device operations.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description addresses the main objective and explains the change well, but the 'How I validated my change' section is incomplete ('change me!' placeholder remains). Complete the 'How I validated my change' section with details on manual and automated validation performed, or explain why validation was deferred.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ROX-29577: VM4VM e2e tests' clearly and concisely describes the main change: adding end-to-end tests for VM scanning feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch piotr-ROX-29577-VM4VM-e2e-tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (3)
vm-image.sh (2)

51-52: Non-deterministic file selection with find | head -1.

If the source image contains multiple .qcow2 or .img files, the selected file depends on filesystem ordering. Consider using find ... -print -quit for explicit first-match behavior, or adding validation that exactly one disk file exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vm-image.sh` around lines 51 - 52, The DISK_NAME assignment uses a
non-deterministic pipeline ("find ... | head -1") which can pick different
files; update the logic around DISK_NAME (the line that constructs DISK_NAME
using WORKDIR) to use a deterministic first-match find invocation (e.g., find
... -print -quit) or, better, enumerate matches and validate there is exactly
one .qcow2/.img file and error if zero or more than one; ensure the echo "==>
Disk: $DISK_NAME" only runs after successful validation.

60-64: Credentials written to temporary script file.

The subscription credentials in REGISTER_ARGS are interpolated into customize.sh which is written to disk. While the trap 'rm -rf "$WORKDIR"' EXIT provides cleanup, the credentials exist in plaintext on the filesystem during execution.

For CI/developer tooling this may be acceptable, but consider:

  • The script could read credentials from environment variables inside the container instead of baking them into the script
  • Process listing could expose the credentials when subscription-manager register runs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vm-image.sh` around lines 60 - 64, The script currently bakes plaintext
credentials into customize.sh by interpolating REGISTER_ARGS; instead stop
expanding REGISTER_ARGS into the file and have the script read credentials from
environment variables at runtime (e.g. REGISTER_USERNAME and REGISTER_PASSWORD)
and invoke subscription-manager without placing secrets on the command line (for
example pipe or heredoc to subscription-manager or otherwise provide password
via stdin), then explicitly unset those env vars and remove WORKDIR after use;
update the code that writes customize.sh (and the script body) to reference
REGISTER_USERNAME/REGISTER_PASSWORD and to avoid passing secrets as command-line
args to subscription-manager (use stdin/heredoc/piping) and ensure cleanup/unset
of the env vars inside the customize.sh script.
tests/testmetrics/parse.go (1)

123-143: Prefix matching may match unintended metrics.

strings.HasPrefix(line, metricPrefix) can match metrics that share a common prefix. For example, querying rox_sensor_vm would match both rox_sensor_vm_total and rox_sensor_vm_reports_total.

Consider validating that the character after the metric name is either { (labels) or whitespace to ensure exact name matching.

♻️ Proposed fix for exact metric name matching
 func parseCounter(body, metricPrefix string) (float64, bool) {
 	for _, line := range strings.Split(body, "\n") {
 		line = strings.TrimSpace(line)
 		if line == "" || strings.HasPrefix(line, "#") {
 			continue
 		}
 		if !strings.HasPrefix(line, metricPrefix) {
 			continue
 		}
+		// Ensure exact metric name match (next char must be '{', ' ', or end of name)
+		rest := line[len(metricPrefix):]
+		if len(rest) > 0 && rest[0] != '{' && rest[0] != ' ' {
+			continue
+		}
 		fields := strings.Fields(line)

Apply the same fix to parseCounterWithLabel.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/testmetrics/parse.go` around lines 123 - 143, parseCounter currently
uses strings.HasPrefix(line, metricPrefix) which can incorrectly match longer
metric names; update parseCounter to ensure the character immediately after
metricPrefix in the line (if any) is either '{' (labels), whitespace (space or
tab), or end-of-line so you only match the exact metric name, and apply the
identical validation to parseCounterWithLabel; locate the checks using
metricPrefix inside parseCounter and parseCounterWithLabel and replace the
simple HasPrefix acceptance with this exact-name post-character check before
parsing fields and values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/vm-scanning-lib.sh`:
- Around line 31-35: The verify_kvm_available function currently aggregates
devices.kubevirt.io/kvm across all nodes and uses a regex that can miss
multi-token all-zero results (e.g. "0 0") and doesn't limit to worker nodes;
update it to query only worker nodes (e.g., use a node selector/label like
node-role.kubernetes.io/worker or exclude control-plane/master labels) and then
parse the returned tokens individually to ensure at least one node has a
capacity > 0; produce a list/array from the kubectl jsonpath output for the
symbol verify_kvm_available and iterate over each token, failing (die) only if
every token is zero/empty or if no worker nodes are found.

In `@tests/testmetrics/scrape.go`:
- Around line 234-242: The ScrapeComponent function builds PodCollectOptions
without populating RestConfig, which breaks port-forward transport; update
ScrapeComponent signature to accept a *rest.Config parameter and pass it into
the PodCollectOptions.RestConfig field so that when transport ==
TransportPortForward the port-forward logic receives the required rest
configuration (modify references in ScrapeComponent and the PodCollectOptions
construction to include RestConfig).

In `@tests/vm_scanning_e2e_utils_test.go`:
- Around line 194-195: The test currently treats PodPending as ready via the
phaseReady boolean; change the readiness check to require pod.Status.Phase ==
coreV1.PodRunning AND the PodReady condition equals coreV1.ConditionTrue instead
of allowing PodPending. Update the phaseReady assignment (and the equivalent
block later) to inspect pod.Status.Conditions for coreV1.PodReady and compare
its Status to coreV1.ConditionTrue, and use that combined check when gating
hostPath evidence collection (replace references to coreV1.PodPending with the
PodReady-condition check).

In `@tests/vm_scanning_suite_test.go`:
- Around line 1223-1225: The assertion requiring acks_received > 0 contradicts
the known-gap note; update the test around requirePositive(comp, cq[5], ...) so
it is conditional: if the ACK-propagation fix/feature is enabled (use your
existing feature toggle or flag check) then assert > 0 via requirePositive(comp,
cq[5], ...), otherwise relax or skip the check (e.g., assert >= 0 or skip the
assertion) and keep the explanatory message; reference the existing call site
requirePositive, the comp variable and cq[5] context when making the conditional
change.
- Around line 762-765: When Create returns IsAlreadyExists, fetch the current
Secret via s.k8sClient.CoreV1().Secrets(s.namespace).Get(ctx, secret.Name,
metaV1.GetOptions{}), copy the existing Secret's ResourceVersion (and any other
server-set metadata you need) onto the local secret object, then call Update
with that updated secret; this ensures the Update on the secret variable
succeeds instead of being rejected for a missing/incorrect resourceVersion.

In `@tests/vm_scanning_utils_test.go`:
- Around line 60-79: Reorder loadVMScanConfig so environment-only prerequisites
(e.g., requireEnv for RHEL_ACTIVATION_ORG and RHEL_ACTIVATION_KEY) are validated
before any host-binary probes; specifically, move the requireEnv checks for
activation-related variables to occur prior to calling discoverVirtctlPath and
discoverRoxagentBinaryPath in loadVMScanConfig (and make the same change in the
similar block around lines 106-123), so missing activation env vars fail fast
and are not masked by local binary discovery errors.
- Around line 25-28: The two constants defaultRepo2CPEPrimaryURL and
defaultRepo2CPEFallbackURL are identical, so on primary fetch failure the code
just retries the same endpoint; change the fallback to be a true fallback by
either (A) requiring the ROXAGENT_REPO2CPE_FALLBACK_URL env var be set and use
that for defaultRepo2CPEFallbackURL, or (B) derive a local file path under the
repo/artifacts directory and set defaultRepo2CPEFallbackURL to that path; update
the constant(s) in tests/vm_scanning_utils_test.go (and any test setup that
depends on defaultRepo2CPEFallbackURL) so the fallback is distinct from
defaultRepo2CPEPrimaryURL and tests assert using the fallback path when the
primary URL is unreachable.

In `@tests/vmhelpers/guest.go`:
- Around line 62-68: The current branch treats any non-transport pgrep error as
"idle"; change it to only treat the pgrep exit-code-1 (no matches) as idle and
propagate all other command failures. In the block handling err (after
errors.Is(err, errSSHTransport)), detect whether err is an *exec.ExitError and
extract the exit status (use ExitError.ExitCode() or
syscall.WaitStatus.ExitStatus()); if the exit code == 1 return (true, "no RHSM
uploader processes", nil), otherwise return (false, "", err) so real pgrep
failures are surfaced. Update the logic around the err handling in the function
that invokes pgrep in tests/vmhelpers/guest.go accordingly.

In `@tests/vmhelpers/ssh_command_framework.go`:
- Around line 104-126: The retry loop in retryOnSSHTransport currently retries
any error matching errSSHTransport, causing terminal failures to spin; introduce
and use a distinct sentinel errRetryableSSHTransport for only retryable
transport glitches and update runSSHCommandWithFramework so it returns
errRetryableSSHTransport for transient/socket/connection issues but returns the
original errSSHTransport (or a wrapped error not matching
errRetryableSSHTransport) for terminal auth failures (bad key/user); then change
callers to pass the same fn into retryOnSSHTransport unchanged (so
retryOnSSHTransport continues to check errors.Is(..., errRetryableSSHTransport)
rather than errSSHTransport) ensuring terminal auth errors are returned
immediately while only genuine retryable transport errors are retried.

In `@tests/vmhelpers/virtctl_ssh.go`:
- Around line 29-39: The buildVirtctlSSHCommand function currently uses
strconv.Quote which emits Go string literals unsafe for POSIX shells; update
buildVirtctlSSHCommand to produce shell-safe single-quoted arguments (escape any
single quote by closing, inserting '"'"', and reopening) or call a known library
such as github.com/alessio/shellescape.Quote for each arg instead of
strconv.Quote so remote --command receives correct POSIX-quoted arguments; keep
the existing join behavior (strings.Join with spaces) and replace references to
strconv.Quote in buildVirtctlSSHCommand accordingly.

In `@tests/vmhelpers/virtctl.go`:
- Around line 136-157: The current formatRemoteCommandStreamsForInlineLog only
truncates stdout but not stderr, allowing unbounded stderr into logs; modify the
function to apply truncateMiddleLines(stderr, inlineLogMaxHeadTailLines) (after
trimming) before writing the "stderr:\n" block so both stderr and stdout are
subject to the same large-output guard; keep use of strings.TrimSpace, preserve
labels ("stderr:" then "stdout:"), and reuse the existing truncateMiddleLines
and inlineLogMaxHeadTailLines symbols.

---

Nitpick comments:
In `@tests/testmetrics/parse.go`:
- Around line 123-143: parseCounter currently uses strings.HasPrefix(line,
metricPrefix) which can incorrectly match longer metric names; update
parseCounter to ensure the character immediately after metricPrefix in the line
(if any) is either '{' (labels), whitespace (space or tab), or end-of-line so
you only match the exact metric name, and apply the identical validation to
parseCounterWithLabel; locate the checks using metricPrefix inside parseCounter
and parseCounterWithLabel and replace the simple HasPrefix acceptance with this
exact-name post-character check before parsing fields and values.

In `@vm-image.sh`:
- Around line 51-52: The DISK_NAME assignment uses a non-deterministic pipeline
("find ... | head -1") which can pick different files; update the logic around
DISK_NAME (the line that constructs DISK_NAME using WORKDIR) to use a
deterministic first-match find invocation (e.g., find ... -print -quit) or,
better, enumerate matches and validate there is exactly one .qcow2/.img file and
error if zero or more than one; ensure the echo "==> Disk: $DISK_NAME" only runs
after successful validation.
- Around line 60-64: The script currently bakes plaintext credentials into
customize.sh by interpolating REGISTER_ARGS; instead stop expanding
REGISTER_ARGS into the file and have the script read credentials from
environment variables at runtime (e.g. REGISTER_USERNAME and REGISTER_PASSWORD)
and invoke subscription-manager without placing secrets on the command line (for
example pipe or heredoc to subscription-manager or otherwise provide password
via stdin), then explicitly unset those env vars and remove WORKDIR after use;
update the code that writes customize.sh (and the script body) to reference
REGISTER_USERNAME/REGISTER_PASSWORD and to avoid passing secrets as command-line
args to subscription-manager (use stdin/heredoc/piping) and ensure cleanup/unset
of the env vars inside the customize.sh script.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 0e87f1b1-3de3-46b6-b925-e5a38a23e7b9

📥 Commits

Reviewing files that changed from the base of the PR and between c1f1ddc and bed608a.

📒 Files selected for processing (47)
  • .openshift-ci/begin.sh
  • .openshift-ci/ci_tests.py
  • .openshift-ci/dispatch.sh
  • docs/superpowers/plans/2026-03-27-vm-scanning-e2e.md
  • docs/superpowers/specs/2026-03-26-vm-scanning-e2e-design.md
  • scripts/ci/jobs/ocp_vm_scanning_e2e_tests.py
  • test.sh
  • tests/Makefile
  • tests/e2e/lib.sh
  • tests/e2e/run-vm-scanning.sh
  • tests/e2e/vm-scanning-lib.sh
  • tests/e2e/yaml/cnv-operator/hyperconverged.yaml
  • tests/e2e/yaml/cnv-operator/namespace.yaml
  • tests/e2e/yaml/cnv-operator/operator-group.yaml
  • tests/e2e/yaml/cnv-operator/subscription.yaml
  • tests/testdata/vm-scanning/cloud-init.yaml.tmpl
  • tests/testdata/vm-scanning/vmscanning.go
  • tests/testmetrics/parse.go
  • tests/testmetrics/parse_test.go
  • tests/testmetrics/scrape.go
  • tests/testmetrics/scrape_test.go
  • tests/vm_scanning_e2e_utils_preflight_test.go
  • tests/vm_scanning_e2e_utils_test.go
  • tests/vm_scanning_suite_test.go
  • tests/vm_scanning_test.go
  • tests/vm_scanning_utils_test.go
  • tests/vmhelpers/central.go
  • tests/vmhelpers/central_test.go
  • tests/vmhelpers/guest.go
  • tests/vmhelpers/guest_ssh_internal.go
  • tests/vmhelpers/guest_ssh_policy.go
  • tests/vmhelpers/guest_test.go
  • tests/vmhelpers/observability.go
  • tests/vmhelpers/observability_test.go
  • tests/vmhelpers/roxagent.go
  • tests/vmhelpers/roxagent_test.go
  • tests/vmhelpers/ssh_command_framework.go
  • tests/vmhelpers/virtctl.go
  • tests/vmhelpers/virtctl_process_nonunix.go
  • tests/vmhelpers/virtctl_process_unix.go
  • tests/vmhelpers/virtctl_ssh.go
  • tests/vmhelpers/virtctl_test.go
  • tests/vmhelpers/vm.go
  • tests/vmhelpers/vm_test.go
  • tests/vmhelpers/wait.go
  • tests/vmhelpers/wait_test.go
  • vm-image.sh

Comment on lines +31 to +35
verify_kvm_available() {
local kvm_capacity
kvm_capacity="$(kubectl get nodes -o jsonpath='{.items[*].status.capacity.devices\.kubevirt\.io/kvm}' 2>/dev/null || true)"
if [[ -z "$kvm_capacity" ]] || [[ "$kvm_capacity" =~ ^[[:space:]]*0*[[:space:]]*$ ]]; then
die "No worker nodes with devices.kubevirt.io/kvm > 0. Nested virtualization may not be enabled on this cluster."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

KVM preflight can false-pass on multi-node zero capacity and ignores worker scoping.

Line 33 collects capacities across all nodes, and Line 34’s regex can miss the all-zero multi-token case (e.g., 0 0). This can let the suite proceed on unsupported clusters.

Suggested fix
 verify_kvm_available() {
     local kvm_capacity
-    kvm_capacity="$(kubectl get nodes -o jsonpath='{.items[*].status.capacity.devices\.kubevirt\.io/kvm}' 2>/dev/null || true)"
-    if [[ -z "$kvm_capacity" ]] || [[ "$kvm_capacity" =~ ^[[:space:]]*0*[[:space:]]*$ ]]; then
+    kvm_capacity="$(kubectl get nodes -l 'node-role.kubernetes.io/worker' \
+        -o jsonpath='{range .items[*]}{.status.capacity.devices\.kubevirt\.io/kvm}{"\n"}{end}' 2>/dev/null || true)"
+    if ! awk 'NF && ($1 + 0) > 0 { found=1 } END { exit(found ? 0 : 1) }' <<<"$kvm_capacity"; then
         die "No worker nodes with devices.kubevirt.io/kvm > 0. Nested virtualization may not be enabled on this cluster."
     fi
     info "KVM device capacity on nodes: ${kvm_capacity}"
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
verify_kvm_available() {
local kvm_capacity
kvm_capacity="$(kubectl get nodes -o jsonpath='{.items[*].status.capacity.devices\.kubevirt\.io/kvm}' 2>/dev/null || true)"
if [[ -z "$kvm_capacity" ]] || [[ "$kvm_capacity" =~ ^[[:space:]]*0*[[:space:]]*$ ]]; then
die "No worker nodes with devices.kubevirt.io/kvm > 0. Nested virtualization may not be enabled on this cluster."
verify_kvm_available() {
local kvm_capacity
kvm_capacity="$(kubectl get nodes -l 'node-role.kubernetes.io/worker' \
-o jsonpath='{range .items[*]}{.status.capacity.devices\.kubevirt\.io/kvm}{"\n"}{end}' 2>/dev/null || true)"
if ! awk 'NF && ($1 + 0) > 0 { found=1 } END { exit(found ? 0 : 1) }' <<<"$kvm_capacity"; then
die "No worker nodes with devices.kubevirt.io/kvm > 0. Nested virtualization may not be enabled on this cluster."
fi
info "KVM device capacity on nodes: ${kvm_capacity}"
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/vm-scanning-lib.sh` around lines 31 - 35, The verify_kvm_available
function currently aggregates devices.kubevirt.io/kvm across all nodes and uses
a regex that can miss multi-token all-zero results (e.g. "0 0") and doesn't
limit to worker nodes; update it to query only worker nodes (e.g., use a node
selector/label like node-role.kubernetes.io/worker or exclude
control-plane/master labels) and then parse the returned tokens individually to
ensure at least one node has a capacity > 0; produce a list/array from the
kubectl jsonpath output for the symbol verify_kvm_available and iterate over
each token, failing (die) only if every token is zero/empty or if no worker
nodes are found.

Comment on lines +234 to +242
func ScrapeComponent(ctx context.Context, clientset kubernetes.Interface, target ScrapeTarget, transport Transport, queries []Query) (map[string]Value, error) {
snaps, err := CollectFromPods(ctx, clientset, PodCollectOptions{
Namespace: target.Namespace,
LabelSelector: target.LabelSelector,
FieldSelector: target.FieldSelector,
Port: target.MetricsPort,
MetricsPath: target.MetricsPath,
Transport: transport,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing RestConfig in ScrapeComponent breaks port-forward transport.

ScrapeComponent constructs PodCollectOptions without setting RestConfig. When transport == TransportPortForward, this will always fail with the error at line 182-184 since RestConfig is required for port-forward transport.

🐛 Proposed fix

The function signature needs to accept *rest.Config and pass it through:

-func ScrapeComponent(ctx context.Context, clientset kubernetes.Interface, target ScrapeTarget, transport Transport, queries []Query) (map[string]Value, error) {
+func ScrapeComponent(ctx context.Context, clientset kubernetes.Interface, restCfg *rest.Config, target ScrapeTarget, transport Transport, queries []Query) (map[string]Value, error) {
 	snaps, err := CollectFromPods(ctx, clientset, PodCollectOptions{
 		Namespace:     target.Namespace,
 		LabelSelector: target.LabelSelector,
 		FieldSelector: target.FieldSelector,
 		Port:          target.MetricsPort,
 		MetricsPath:   target.MetricsPath,
 		Transport:     transport,
+		RestConfig:    restCfg,
 	})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func ScrapeComponent(ctx context.Context, clientset kubernetes.Interface, target ScrapeTarget, transport Transport, queries []Query) (map[string]Value, error) {
snaps, err := CollectFromPods(ctx, clientset, PodCollectOptions{
Namespace: target.Namespace,
LabelSelector: target.LabelSelector,
FieldSelector: target.FieldSelector,
Port: target.MetricsPort,
MetricsPath: target.MetricsPath,
Transport: transport,
})
func ScrapeComponent(ctx context.Context, clientset kubernetes.Interface, restCfg *rest.Config, target ScrapeTarget, transport Transport, queries []Query) (map[string]Value, error) {
snaps, err := CollectFromPods(ctx, clientset, PodCollectOptions{
Namespace: target.Namespace,
LabelSelector: target.LabelSelector,
FieldSelector: target.FieldSelector,
Port: target.MetricsPort,
MetricsPath: target.MetricsPath,
Transport: transport,
RestConfig: restCfg,
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/testmetrics/scrape.go` around lines 234 - 242, The ScrapeComponent
function builds PodCollectOptions without populating RestConfig, which breaks
port-forward transport; update ScrapeComponent signature to accept a
*rest.Config parameter and pass it into the PodCollectOptions.RestConfig field
so that when transport == TransportPortForward the port-forward logic receives
the required rest configuration (modify references in ScrapeComponent and the
PodCollectOptions construction to include RestConfig).

Comment on lines +194 to +195
phaseReady := pod.Status.Phase == coreV1.PodRunning || pod.Status.Phase == coreV1.PodPending
if !phaseReady {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not treat PodPending as ready in VSOCK preflight.

Using pending pods as “ready” can false-pass preflight and shift failures downstream. Gate on running + PodReady=True before accepting hostPath evidence.

Suggested fix
-			phaseReady := pod.Status.Phase == coreV1.PodRunning || pod.Status.Phase == coreV1.PodPending
+			phaseReady := pod.Status.Phase == coreV1.PodRunning
+			podReady := false
+			for _, c := range pod.Status.Conditions {
+				if c.Type == coreV1.PodReady && c.Status == coreV1.ConditionTrue {
+					podReady = true
+					break
+				}
+			}
+			phaseReady = phaseReady && podReady
 			if !phaseReady {
 				fmt.Fprintf(&diag, "namespace %q pod %q: phase=%q\n", ns, pod.Name, pod.Status.Phase)
 			}

Also applies to: 213-223

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/vm_scanning_e2e_utils_test.go` around lines 194 - 195, The test
currently treats PodPending as ready via the phaseReady boolean; change the
readiness check to require pod.Status.Phase == coreV1.PodRunning AND the
PodReady condition equals coreV1.ConditionTrue instead of allowing PodPending.
Update the phaseReady assignment (and the equivalent block later) to inspect
pod.Status.Conditions for coreV1.PodReady and compare its Status to
coreV1.ConditionTrue, and use that combined check when gating hostPath evidence
collection (replace references to coreV1.PodPending with the PodReady-condition
check).

Comment on lines +762 to +765
_, err = s.k8sClient.CoreV1().Secrets(s.namespace).Create(ctx, secret, metaV1.CreateOptions{})
if apierrors.IsAlreadyExists(err) {
_, err = s.k8sClient.CoreV1().Secrets(s.namespace).Update(ctx, secret, metaV1.UpdateOptions{})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fetch the existing Secret before updating it.

On the reuse path this Update sends a brand-new object without the current resourceVersion, so Kubernetes will reject it. That turns namespace reuse into a hard failure instead of idempotent setup.

Suggested fix
  _, err = s.k8sClient.CoreV1().Secrets(s.namespace).Create(ctx, secret, metaV1.CreateOptions{})
  if apierrors.IsAlreadyExists(err) {
-     _, err = s.k8sClient.CoreV1().Secrets(s.namespace).Update(ctx, secret, metaV1.UpdateOptions{})
+     existing, getErr := s.k8sClient.CoreV1().Secrets(s.namespace).Get(ctx, vmImagePullSecretName, metaV1.GetOptions{})
+     require.NoError(s.T(), getErr, "get existing image pull secret %q in namespace %q", vmImagePullSecretName, s.namespace)
+     existing.Type = coreV1.SecretTypeDockerConfigJson
+     if existing.Data == nil {
+         existing.Data = map[string][]byte{}
+     }
+     existing.Data[coreV1.DockerConfigJsonKey] = dockerCfg
+     _, err = s.k8sClient.CoreV1().Secrets(s.namespace).Update(ctx, existing, metaV1.UpdateOptions{})
  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_, err = s.k8sClient.CoreV1().Secrets(s.namespace).Create(ctx, secret, metaV1.CreateOptions{})
if apierrors.IsAlreadyExists(err) {
_, err = s.k8sClient.CoreV1().Secrets(s.namespace).Update(ctx, secret, metaV1.UpdateOptions{})
}
_, err = s.k8sClient.CoreV1().Secrets(s.namespace).Create(ctx, secret, metaV1.CreateOptions{})
if apierrors.IsAlreadyExists(err) {
existing, getErr := s.k8sClient.CoreV1().Secrets(s.namespace).Get(ctx, vmImagePullSecretName, metaV1.GetOptions{})
require.NoError(s.T(), getErr, "get existing image pull secret %q in namespace %q", vmImagePullSecretName, s.namespace)
existing.Type = coreV1.SecretTypeDockerConfigJson
if existing.Data == nil {
existing.Data = map[string][]byte{}
}
existing.Data[coreV1.DockerConfigJsonKey] = dockerCfg
_, err = s.k8sClient.CoreV1().Secrets(s.namespace).Update(ctx, existing, metaV1.UpdateOptions{})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/vm_scanning_suite_test.go` around lines 762 - 765, When Create returns
IsAlreadyExists, fetch the current Secret via
s.k8sClient.CoreV1().Secrets(s.namespace).Get(ctx, secret.Name,
metaV1.GetOptions{}), copy the existing Secret's ResourceVersion (and any other
server-set metadata you need) onto the local secret object, then call Update
with that updated secret; this ensures the Update on the secret variable
succeeds instead of being rejected for a missing/incorrect resourceVersion.

Comment on lines +1223 to +1225
requirePositive(comp, cq[5],
"compliance relay acks_received should be > 0 when Central ACKs an index report; "+
"known gap: Sensor does not propagate ACKs back to the compliance relay yet")
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This assertion contradicts the known-gap note.

The test currently requires acks_received > 0, but the message says ACKs are not propagated back to the compliance relay yet. As written, the suite will fail until that product gap is closed. Gate this assertion behind the feature/fix, or relax it for now.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/vm_scanning_suite_test.go` around lines 1223 - 1225, The assertion
requiring acks_received > 0 contradicts the known-gap note; update the test
around requirePositive(comp, cq[5], ...) so it is conditional: if the
ACK-propagation fix/feature is enabled (use your existing feature toggle or flag
check) then assert > 0 via requirePositive(comp, cq[5], ...), otherwise relax or
skip the check (e.g., assert >= 0 or skip the assertion) and keep the
explanatory message; reference the existing call site requirePositive, the comp
variable and cq[5] context when making the conditional change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is known and okay like this. We are working in a parallel track to add the support for ACKs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vikin91, understood — the assertion is intentionally failing-until-fixed, acting as a built-in gate that will start passing once the ACK-propagation work in the parallel track lands. I'll note this for future reviews.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

Comment on lines +60 to +79
func loadVMScanConfig() (*vmScanConfig, error) {
cfg := &vmScanConfig{}

var err error
if cfg.ImageRHEL9, err = requireEnv("VM_IMAGE_RHEL9"); err != nil {
return nil, err
}
if cfg.ImageRHEL10, err = requireEnv("VM_IMAGE_RHEL10"); err != nil {
return nil, err
}

cfg.GuestUserRHEL9 = envOrDefault("VM_GUEST_USER_RHEL9", defaultGuestUser)
cfg.GuestUserRHEL10 = envOrDefault("VM_GUEST_USER_RHEL10", defaultGuestUser)

if cfg.VirtctlPath, err = discoverVirtctlPath(); err != nil {
return nil, err
}
if cfg.RoxagentBinaryPath, err = discoverRoxagentBinaryPath(); err != nil {
return nil, err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate env-only prerequisites before probing local binaries.

loadVMScanConfig looks up virtctl and roxagent before it checks activation requirements. That means a missing RHEL_ACTIVATION_ORG/RHEL_ACTIVATION_KEY is masked by host-specific tool discovery failures, and the config tests in this file stop being hermetic.

💡 Proposed reordering
-	if cfg.VirtctlPath, err = discoverVirtctlPath(); err != nil {
-		return nil, err
-	}
-	if cfg.RoxagentBinaryPath, err = discoverRoxagentBinaryPath(); err != nil {
-		return nil, err
-	}
-
 	cfg.Repo2CPEPrimaryURL = envOrDefault("ROXAGENT_REPO2CPE_PRIMARY_URL", defaultRepo2CPEPrimaryURL)
 	cfg.Repo2CPEFallbackURL = envOrDefault("ROXAGENT_REPO2CPE_FALLBACK_URL", defaultRepo2CPEFallbackURL)
 	cfg.Repo2CPEPrimaryAttempts = envIntOrDefault("ROXAGENT_REPO2CPE_PRIMARY_ATTEMPTS", defaultRepo2CPEAttempts)
@@
 	if cfg.RequireActivation {
 		if cfg.ActivationOrg == "" {
 			return nil, errors.New("activation is required but RHEL_ACTIVATION_ORG is not set")
 		}
 		if cfg.ActivationKey == "" {
 			return nil, errors.New("activation is required but RHEL_ACTIVATION_KEY is not set")
 		}
 	}
+
+	if cfg.VirtctlPath, err = discoverVirtctlPath(); err != nil {
+		return nil, err
+	}
+	if cfg.RoxagentBinaryPath, err = discoverRoxagentBinaryPath(); err != nil {
+		return nil, err
+	}

Also applies to: 106-123

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/vm_scanning_utils_test.go` around lines 60 - 79, Reorder
loadVMScanConfig so environment-only prerequisites (e.g., requireEnv for
RHEL_ACTIVATION_ORG and RHEL_ACTIVATION_KEY) are validated before any
host-binary probes; specifically, move the requireEnv checks for
activation-related variables to occur prior to calling discoverVirtctlPath and
discoverRoxagentBinaryPath in loadVMScanConfig (and make the same change in the
similar block around lines 106-123), so missing activation env vars fail fast
and are not masked by local binary discovery errors.

Comment thread tests/vmhelpers/guest.go
Comment on lines +62 to +68
if err != nil {
if errors.Is(err, errSSHTransport) {
return false, "", err
}
// pgrep exits 1 when no processes match.
return true, "no RHSM uploader processes", nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only treat pgrep “no matches” as idle.

This branch converts every non-transport pgrep failure into “idle”. If pgrep itself fails for any other reason, the precheck silently succeeds and the later dnf flow runs on bad assumptions, which will make the suite flaky and much harder to diagnose. Please distinguish exit-code-1/no-match from real command failures and propagate the latter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/vmhelpers/guest.go` around lines 62 - 68, The current branch treats any
non-transport pgrep error as "idle"; change it to only treat the pgrep
exit-code-1 (no matches) as idle and propagate all other command failures. In
the block handling err (after errors.Is(err, errSSHTransport)), detect whether
err is an *exec.ExitError and extract the exit status (use ExitError.ExitCode()
or syscall.WaitStatus.ExitStatus()); if the exit code == 1 return (true, "no
RHSM uploader processes", nil), otherwise return (false, "", err) so real pgrep
failures are surfaced. Update the logic around the err handling in the function
that invokes pgrep in tests/vmhelpers/guest.go accordingly.

Comment on lines +104 to +126
// retryOnSSHTransport retries fn whenever it returns an errSSHTransport error.
// Non-transport errors and nil are returned immediately. The retry loop is
// bounded by ctx — callers should set an appropriate deadline/timeout.
func retryOnSSHTransport(ctx context.Context, logf func(string, ...any), desc string, fn func(ctx context.Context) error) error {
var lastErr error
for attempt := 1; ; attempt++ {
lastErr = fn(ctx)
if lastErr == nil || !errors.Is(lastErr, errSSHTransport) {
return lastErr
}
if logf != nil {
logf("%s: SSH transport error (attempt %d), retrying in %s: %v",
desc, attempt, sshTransportRetryInterval, lastErr)
}
timer := time.NewTimer(sshTransportRetryInterval)
select {
case <-ctx.Done():
timer.Stop()
return fmt.Errorf("%s: context expired while retrying SSH transport error: %w (last: %v)",
desc, ctx.Err(), lastErr)
case <-timer.C:
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t retry terminal SSH transport failures.

retryOnSSHTransport retries any error that matches errSSHTransport, but runSSHCommandWithFramework uses that same sentinel for both retryable transport glitches and terminal failures like SSH authentication errors. A bad key/user will therefore spin until the outer context times out instead of failing immediately.

💡 Suggested direction
-var errSSHTransport = errors.New("SSH transport error")
+var (
+	errSSHTransport          = errors.New("SSH transport error")
+	errRetryableSSHTransport = errors.New("retryable SSH transport error")
+)
 func retryOnSSHTransport(ctx context.Context, logf func(string, ...any), desc string, fn func(ctx context.Context) error) error {
 	var lastErr error
 	for attempt := 1; ; attempt++ {
 		lastErr = fn(ctx)
-		if lastErr == nil || !errors.Is(lastErr, errSSHTransport) {
+		if lastErr == nil || !errors.Is(lastErr, errRetryableSSHTransport) {
 			return lastErr
 		}

Then wrap only the retryable branch in runSSHCommandWithFramework with errRetryableSSHTransport, while leaving terminal auth failures as plain errSSHTransport.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/vmhelpers/ssh_command_framework.go` around lines 104 - 126, The retry
loop in retryOnSSHTransport currently retries any error matching
errSSHTransport, causing terminal failures to spin; introduce and use a distinct
sentinel errRetryableSSHTransport for only retryable transport glitches and
update runSSHCommandWithFramework so it returns errRetryableSSHTransport for
transient/socket/connection issues but returns the original errSSHTransport (or
a wrapped error not matching errRetryableSSHTransport) for terminal auth
failures (bad key/user); then change callers to pass the same fn into
retryOnSSHTransport unchanged (so retryOnSSHTransport continues to check
errors.Is(..., errRetryableSSHTransport) rather than errSSHTransport) ensuring
terminal auth errors are returned immediately while only genuine retryable
transport errors are retried.

Comment on lines +29 to +39
// buildVirtctlSSHCommand joins shell command parts into one string with per-argument strconv.Quote quoting.
func buildVirtctlSSHCommand(command ...string) string {
if len(command) == 0 {
return ""
}
quoted := make([]string, 0, len(command))
for _, arg := range command {
quoted = append(quoted, strconv.Quote(arg))
}
return strings.Join(quoted, " ")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Go, is strconv.Quote appropriate for POSIX shell argument escaping when building a single SSH remote command string (e.g., --command "...")? If not, what escaping strategy is recommended?

💡 Result:

No, strconv.Quote is not appropriate for POSIX shell argument escaping when building a single SSH remote command string (e.g., ssh host --command "..."). strconv.Quote produces Go-syntax double-quoted string literals with escapes like \t, \n, \xFF that are not valid in POSIX shell syntax, leading to syntax errors or incorrect interpretation when executed remotely via the shell (SSH runs the command via $SHELL -c "..."). For example, strconv.Quote(Fran &amp; Freddie's Diner ☺) produces ""Fran &amp; Freddie's Diner\t☺"", where \t is a literal backslash-t, not a tab, causing shell parsing failures. The recommended strategy is to use a POSIX shell quoting library to escape arguments safely for sh -c execution: 1. Use github.com/alessio/shellescape.Quote(arg) for single arguments, which wraps in single quotes and escapes ' as '' (POSIX-compatible). Example: import "github.com/alessio/shellescape" quoted := shellescape.Quote("O'Reilly &amp; Sons") // "'O''Reilly &amp; Sons'" cmd := exec.Command("ssh", "host", "--", "echo", quoted) For full commands: shellescape.QuoteCommand([]string{"echo", arg}) joins with spaces. 2. Alternatives: github.com/apparentlymart/go-shquot/shquot.POSIXShell([]string{"prog", ...args}) or github.com/chhongzh/shlex.Join(args). If the remote command requires shell features (pipes, redirects), build the full command string with quoted args and pass as single argument to sh -c: fullCmd := fmt.Sprintf(echo %s, shellescape.Quote(arg)) cmd := exec.Command("ssh", "host", "sh", "-c", fullCmd) This prevents injection while handling arbitrary strings containing spaces, ', ", $, ;, etc. Always quote to preserve arguments even if safe. Prefer exec.Command with separate args when possible to avoid shell entirely, but for SSH --command, shell-quoted single string is required.

Citations:


Use shell-safe quoting for remote --command, not strconv.Quote.

strconv.Quote produces Go-literal syntax with escapes like \t and \n that are not valid in POSIX shell, causing parsing failures. For example, strconv.Quote("foo\tbar") produces "foo\tbar" where \t is a literal backslash-t, not a tab, breaking argument semantics when passed to the remote shell via SSH.

Replace with proper POSIX shell quoting (single quotes with ' escaped as '"'"'), or use an established library like github.com/alessio/shellescape.Quote.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/vmhelpers/virtctl_ssh.go` around lines 29 - 39, The
buildVirtctlSSHCommand function currently uses strconv.Quote which emits Go
string literals unsafe for POSIX shells; update buildVirtctlSSHCommand to
produce shell-safe single-quoted arguments (escape any single quote by closing,
inserting '"'"', and reopening) or call a known library such as
github.com/alessio/shellescape.Quote for each arg instead of strconv.Quote so
remote --command receives correct POSIX-quoted arguments; keep the existing join
behavior (strings.Join with spaces) and replace references to strconv.Quote in
buildVirtctlSSHCommand accordingly.

Comment on lines +136 to +157
// formatRemoteCommandStreamsForInlineLog formats stdout and stderr for multi-line completion logs.
// Large outputs are truncated to the first and last inlineLogMaxHeadTailLines lines.
func formatRemoteCommandStreamsForInlineLog(stdout, stderr string) string {
var b strings.Builder
stderr = strings.TrimSpace(stderr)
stdout = strings.TrimSpace(stdout)
if stderr != "" {
b.WriteString("stderr:\n")
b.WriteString(stderr)
}
if stdout != "" {
if b.Len() > 0 {
b.WriteString("\n")
}
b.WriteString("stdout:\n")
b.WriteString(truncateMiddleLines(stdout, inlineLogMaxHeadTailLines))
}
if b.Len() == 0 {
return "output: <empty stdout/stderr>"
}
return b.String()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Truncate stderr too, not just stdout.

The large-output guard only applies to stdout right now. A noisy failure path can still dump unbounded stderr into CI logs, which is the common case for command failures and defeats the “large outputs are truncated” contract.

💡 Proposed fix
 func formatRemoteCommandStreamsForInlineLog(stdout, stderr string) string {
 	var b strings.Builder
 	stderr = strings.TrimSpace(stderr)
 	stdout = strings.TrimSpace(stdout)
 	if stderr != "" {
 		b.WriteString("stderr:\n")
-		b.WriteString(stderr)
+		b.WriteString(truncateMiddleLines(stderr, inlineLogMaxHeadTailLines))
 	}
 	if stdout != "" {
 		if b.Len() > 0 {
 			b.WriteString("\n")
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/vmhelpers/virtctl.go` around lines 136 - 157, The current
formatRemoteCommandStreamsForInlineLog only truncates stdout but not stderr,
allowing unbounded stderr into logs; modify the function to apply
truncateMiddleLines(stderr, inlineLogMaxHeadTailLines) (after trimming) before
writing the "stderr:\n" block so both stderr and stdout are subject to the same
large-output guard; keep use of strings.TrimSpace, preserve labels ("stderr:"
then "stdout:"), and reuse the existing truncateMiddleLines and
inlineLogMaxHeadTailLines symbols.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

🚀 Build Images Ready

Images are ready for commit dffe479. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-689-gdffe479fcf

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 31.33199% with 1361 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.50%. Comparing base (68b4ebd) to head (dffe479).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
tests/vmhelpers/vm.go 25.81% 263 Missing and 10 partials ⚠️
tests/vmhelpers/guest.go 18.68% 247 Missing and 1 partial ⚠️
tests/testmetrics/scrape.go 10.76% 174 Missing ⚠️
tests/vmhelpers/central.go 45.05% 121 Missing and 18 partials ⚠️
tests/vmhelpers/virtctl.go 40.40% 109 Missing and 9 partials ⚠️
tests/vmhelpers/roxagent.go 24.81% 102 Missing and 1 partial ⚠️
tests/vmhelpers/guest_ssh_internal.go 30.76% 90 Missing ⚠️
tests/vmhelpers/ssh_command_framework.go 36.36% 69 Missing and 1 partial ⚠️
tests/vmhelpers/guest_ssh_policy.go 31.64% 53 Missing and 1 partial ⚠️
tests/testmetrics/parse.go 47.91% 44 Missing and 6 partials ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20042      +/-   ##
==========================================
- Coverage   49.67%   49.50%   -0.18%     
==========================================
  Files        2765     2778      +13     
  Lines      209049   211031    +1982     
==========================================
+ Hits       103855   104476     +621     
- Misses      97517    98824    +1307     
- Partials     7677     7731      +54     
Flag Coverage Δ
go-unit-tests 49.50% <31.33%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

vikin91 added 13 commits April 17, 2026 09:40
User request (feature): "design the end-to-end tests for the VM scanning feature."
User request (branch): "add the changes to the piotr/ROX-29577-VM4VM-e2e-tests branch."

Add spec and impl plan (Superpower artifacts)

Add a Go test_e2e VM-scanning suite that provisions RHEL 9 and RHEL 10 KubeVirt VMs, installs and runs roxagent, validates Central scan visibility, and captures progressive diagnostics for flaky-path triage.
Wire a dedicated OpenShift CI job and e2e scripts with CNV/VSOCK prerequisite checks and activation/repo2cpe guardrails.

AI-assisted: this commit was partially generated with AI assistance.
Made-with: Cursor
@vikin91
Copy link
Copy Markdown
Contributor Author

vikin91 commented Apr 17, 2026

This change is part of the following stack:

Change managed by git-spice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant