Skip to content

go.d/powerstore: add Dell PowerStore storage array collector (V2 framework)#21929

Open
ktsaou wants to merge 7 commits intomasterfrom
go.d/powerstore-collector
Open

go.d/powerstore: add Dell PowerStore storage array collector (V2 framework)#21929
ktsaou wants to merge 7 commits intomasterfrom
go.d/powerstore-collector

Conversation

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Mar 10, 2026

Summary

Add a new Go collector for Dell PowerStore storage arrays, built on the V2 collector framework (metrix.CollectorStore + YAML chart templates).

What it monitors

  • Cluster: physical/logical space usage, efficiency ratios (dedup, compression, thin savings)
  • Appliances: IOPS, bandwidth, latency, CPU utilization, space usage, logical space, efficiency ratios
  • Volumes: IOPS, bandwidth, latency, logical space, thin savings
  • Nodes: IOPS, bandwidth, latency, current logins
  • FC Ports: IOPS, bandwidth, latency, link status
  • Ethernet Ports: bytes/packets rate, errors, link status
  • File Systems: IOPS, bandwidth, latency
  • Hardware Health: fan, PSU, drive, battery, node component status
  • Alerts: active alerts by severity (critical, major, minor, info)
  • Drives: endurance remaining percentage
  • NAS Servers: operational status (started, stopped, degraded)
  • Replication: data remaining, transferred, transfer rate

Key implementation details

  • V2 framework: CreateV2, MetricStore(), ChartTemplateYAML(), Collect() error
  • SnapshotMeter + SnapshotGaugeVec for periodic API polls (PowerStore returns pre-computed rates)
  • YAML chart templates with proper float: true, multiplier: -1, algorithm: absolute
  • Concurrent API collection with sync.WaitGroup + semaphore (10 concurrent requests)
  • Atomic discovery: all API calls must succeed before topology state is swapped
  • Pagination support for large deployments
  • Volume selector pattern matching for filtering
  • CSRF token and session management for PowerStore REST API

Charts (28 total)

Cluster (3), Hardware Health (5), Alerts (1), NAS (1), Replication (2), Appliance (7), Volume (5), Node (4), FC Port (4), Ethernet Port (4), File System (3), Drive (1)

Test plan

  • go build succeeds
  • gofmt clean
  • go vet clean
  • All 6 tests pass (Init, Check, Collect, CollectWithVolumeSelector, Cleanup, ChartTemplateYAML)
  • Race detector passes (go test -race)
  • AssertChartCoverage validates all chart template dimensions are materialized
  • Chart template schema validation passes
  • Chart template compiles without errors

Add a new Go collector for Dell PowerStore storage arrays via the
PowerStore REST API. Implements a custom REST client (no external SDK)
with Basic Auth, cookie-based sessions, and CSRF token handling.

Monitors: cluster capacity/efficiency, appliance IOPS/bandwidth/latency/CPU,
volume metrics with filtering, node performance, FC/Ethernet port metrics,
hardware health (fans/PSUs/drives/batteries/nodes), active alerts by severity,
drive endurance, NAS server status, and replication metrics.

Key design decisions:
- 30-second default polling interval (matches API Five_Mins granularity)
- Discovery separated from collection (runs every 5 cycles)
- Session retry on 403 Forbidden (PowerStore uses 403, not 401, for stale sessions)
- Volume filtering via SimplePatternsMatcher (include/exclude patterns)
- Hardware data cached during discovery to avoid duplicate API calls
@ktsaou ktsaou requested a review from ilyam8 as a code owner March 10, 2026 08:17
@github-actions github-actions bot added area/collectors Everything related to data collection collectors/go.d area/metadata Integrations metadata area/go labels Mar 10, 2026
@ktsaou ktsaou requested a review from Copilot March 10, 2026 08:20
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 47 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant Netdata as Netdata Agent (go.d.plugin)
    participant Coll as PowerStore Collector
    participant Client as REST Client
    participant Auth as CSRF & Session Cache
    participant API as PowerStore REST API

    Note over Netdata, API: NEW: Collector Initialization
    Netdata->>Coll: Init()
    Coll->>Coll: NEW: Initialize volume_selector Matcher
    Coll->>Client: NEW: Create HTTP client (cookie jar enabled)

    Note over Netdata, API: NEW: Collection Cycle (30s interval)
    Netdata->>Coll: Collect()
    
    opt NEW: First run OR every 5th cycle (Discovery)
        Coll->>Client: discovery()
        Client->>API: GET /cluster, /appliance, /volume, etc.
        API-->>Client: Resource IDs and Metadata
        Client-->>Coll: List of entities
        Coll->>Coll: NEW: Filter volumes via volume_selector
        Coll->>Coll: NEW: Cache Hardware/Node/Drive mapping
    end

    Note over Coll, API: NEW: Metric Gathering Flow
    Coll->>Coll: collectClusterSpace()
    Coll->>Coll: collectAppliances()
    Coll->>Coll: collectVolumes()
    Coll->>Coll: collectHardwareHealth() (uses cached discovery data)

    loop For each entity (Volume, Appliance, etc.)
        Coll->>Client: POST /metrics/generate (entity_id)
        
        Note over Client, API: NEW: Auth & CSRF Management
        Client->>API: Request with DELL-EMC-TOKEN header
        
        alt Success (2xx)
            API-->>Client: Timeseries data
        else Session Expired (403 Forbidden)
            Client->>API: NEW: POST /login_session (Basic Auth)
            API-->>Client: New Session Cookie & CSRF Token
            Client->>Auth: NEW: Update cached DELL-EMC-TOKEN
            Client->>API: Retry original metrics request
            API-->>Client: Timeseries data
        end
        Client-->>Coll: Metrics JSON
    end

    Coll->>Coll: NEW: Update dynamic charts
    Coll-->>Netdata: map[string]int64 (stm.ToMap)

    Note over Netdata, API: NEW: Cleanup
    Netdata->>Coll: Cleanup()
    Coll->>Client: Logout()
    Client->>Auth: Clear CSRF state
Loading

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new powerstore go.d collector module to monitor Dell PowerStore arrays via the PowerStore REST API (capacity, performance, health, alerts, NAS, replication), including config/metadata/schema and unit tests with mock API fixtures.

Changes:

  • Introduces the powerstore collector implementation (discovery, collection, charts, REST client with session/CSRF handling).
  • Adds module metadata + config schema + example go.d config for PowerStore.
  • Adds unit tests and mock REST API JSON fixtures for the collector.

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/go/plugin/go.d/config/go.d/powerstore.conf New example go.d job configuration for PowerStore.
src/go/plugin/go.d/collector/init.go Registers the new collector via blank import.
src/go/plugin/go.d/collector/powerstore/metrics.go Defines the collector’s internal metrics model and stm tags.
src/go/plugin/go.d/collector/powerstore/metadata.yaml Documents configuration options and exposed charts/metrics.
src/go/plugin/go.d/collector/powerstore/config_schema.json JSON schema + UI schema for validating/rendering job configuration.
src/go/plugin/go.d/collector/powerstore/collector.go Collector wiring, init/check/collect/cleanup, config and state.
src/go/plugin/go.d/collector/powerstore/collect.go Discovery loop + orchestrates per-component collection and chart updates.
src/go/plugin/go.d/collector/powerstore/collect_cluster.go Collects cluster capacity/efficiency metrics.
src/go/plugin/go.d/collector/powerstore/collect_appliance.go Collects appliance performance/capacity metrics.
src/go/plugin/go.d/collector/powerstore/collect_volume.go Collects per-volume performance/capacity metrics.
src/go/plugin/go.d/collector/powerstore/collect_node.go Collects per-node performance/logins metrics.
src/go/plugin/go.d/collector/powerstore/collect_ports.go Collects FC/Ethernet port performance + link state.
src/go/plugin/go.d/collector/powerstore/collect_filesystem.go Collects file system performance metrics.
src/go/plugin/go.d/collector/powerstore/collect_hardware.go Aggregates hardware lifecycle state health counts.
src/go/plugin/go.d/collector/powerstore/collect_alerts.go Counts active alerts by severity.
src/go/plugin/go.d/collector/powerstore/collect_drives.go Collects SSD endurance remaining metrics.
src/go/plugin/go.d/collector/powerstore/collect_nas.go Counts NAS server operational status states.
src/go/plugin/go.d/collector/powerstore/collect_replication.go Collects/aggregates replication (copy) metrics.
src/go/plugin/go.d/collector/powerstore/charts.go Static + dynamic chart templates and dynamic chart creation.
src/go/plugin/go.d/collector/powerstore/client/client.go REST client with cookie session + CSRF token caching and 403 retry login.
src/go/plugin/go.d/collector/powerstore/client/types.go REST API response/request types for entities and metrics.
src/go/plugin/go.d/collector/powerstore/collector_test.go Unit tests using a mock PowerStore API server.
src/go/plugin/go.d/collector/powerstore/testdata/cluster.json Mock API fixture: clusters.
src/go/plugin/go.d/collector/powerstore/testdata/appliance.json Mock API fixture: appliances.
src/go/plugin/go.d/collector/powerstore/testdata/volume.json Mock API fixture: volumes.
src/go/plugin/go.d/collector/powerstore/testdata/hardware_all.json Mock API fixture: all hardware components.
src/go/plugin/go.d/collector/powerstore/testdata/hardware_fan.json Mock API fixture: fan hardware.
src/go/plugin/go.d/collector/powerstore/testdata/hardware_psu.json Mock API fixture: PSU hardware.
src/go/plugin/go.d/collector/powerstore/testdata/hardware_drive.json Mock API fixture: drive hardware.
src/go/plugin/go.d/collector/powerstore/testdata/hardware_battery.json Mock API fixture: battery hardware.
src/go/plugin/go.d/collector/powerstore/testdata/hardware_node.json Mock API fixture: node hardware.
src/go/plugin/go.d/collector/powerstore/testdata/alert.json Mock API fixture: alerts.
src/go/plugin/go.d/collector/powerstore/testdata/fc_port.json Mock API fixture: FC ports.
src/go/plugin/go.d/collector/powerstore/testdata/eth_port.json Mock API fixture: Ethernet ports.
src/go/plugin/go.d/collector/powerstore/testdata/file_system.json Mock API fixture: file systems.
src/go/plugin/go.d/collector/powerstore/testdata/nas_server.json Mock API fixture: NAS servers.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_space_cluster.json Mock API fixture: cluster space metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_space_appliance.json Mock API fixture: appliance space metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_space_volume.json Mock API fixture: volume space metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_appliance.json Mock API fixture: appliance perf metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_volume.json Mock API fixture: volume perf metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_node.json Mock API fixture: node perf metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_fc_port.json Mock API fixture: FC port perf metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_eth_port.json Mock API fixture: Ethernet port perf metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_filesystem.json Mock API fixture: file system perf metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_wear_drive.json Mock API fixture: drive wear metrics.
src/go/plugin/go.d/collector/powerstore/testdata/metrics_copy_appliance.json Mock API fixture: replication/copy metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

ktsaou added 2 commits March 10, 2026 10:54
- Apply gofmt to fix struct alignment in metrics.go, client.go, types.go
- Rename naServers to nasServers for clarity (Copilot review)
- Add autodetection_retry and vnode to config_schema.json (required since
  additionalProperties is false — configs using these would fail validation)
- Allow null type for headers in config_schema.json (matches other collectors)
- Add bounded concurrency (10 concurrent API calls) using semaphore
  pattern, matching Dell's own csm-metrics-powerstore approach
- Fix GET endpoint pagination: server caps limit at 2000 (was 5000),
  handle HTTP 206 Partial Content for proper pagination
- Replace doGetWithRetry with generic doGetAllPages that fetches all
  pages automatically via offset-based pagination
- All entity-level collection functions (volumes, appliances, nodes,
  ports, filesystems, drives, replication) now run in parallel with
  per-entity goroutines bounded by the shared semaphore
- Race detector tests pass cleanly
@ilyam8
Copy link
Member

ilyam8 commented Mar 10, 2026

Why is this implemented using an old framework?

@ktsaou
Copy link
Member Author

ktsaou commented Mar 10, 2026

@ilyam8 I am porting this to V2.

@ktsaou
Copy link
Member Author

ktsaou commented Mar 10, 2026

And I will also add Dell PowerVault which is similar.

Replace V1 chart definitions (charts.go) with YAML-based chart
templates (charts.yaml) and metrix.CollectorStore for metric
registration and collection.

Key changes:
- Use SnapshotMeter/SnapshotGaugeVec for periodic API poll metrics
- Chart definitions in charts.yaml with proper float, multiplier,
  and algorithm settings
- Atomic discovery: build temp state, swap only on full success
- Remove redundant totalIops/totalBandwidth aggregates (read+write
  sums computed by API, not useful as separate metrics)
- Add appliance logical space and efficiency ratio charts
- Add FC port latency and volume thin savings charts
- Stacked visualization for eth port errors
- Tests updated with metrix cycle control and chart coverage
@ktsaou ktsaou changed the title go.d/powerstore: add Dell PowerStore storage array collector go.d/powerstore: add Dell PowerStore storage array collector (V2 framework) Mar 10, 2026
@ilyam8
Copy link
Member

ilyam8 commented Mar 10, 2026

@ktsaou Don’t use float: true. Float support has been merged, but @stelfrag mentioned that we shouldn’t use it yet. I’m not sure about the details or the reason, or when it will be ready to use.

@ktsaou
Copy link
Member Author

ktsaou commented Mar 10, 2026

I think it is ok for new collectors. Old is the problem.

Why we should not use it on old collectors?

float is not backwards compatible with older parents. A new child connected to an old parent will have precision loss. It works, but the streaming protocol is negotiated and the parent will receive int values, not float.

We agreed that we will ship a stable version capable of float without using it, and then we will switch collectors to float, so that users upgrading children before parents will not see a difference.

But this is a new collector. There are no past data. And most likely they will also have a compatible parent too.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 47 out of 47 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

ktsaou added 3 commits March 11, 2026 01:08
Fill all empty sections with detailed end-user documentation:
- method_description: REST API, Basic Auth, CSRF token, metrics generation API
- additional_permissions, auto_detection, limits, performance_impact
- Add labels to all 7 per-entity metric scopes
- Add 3 missing charts: appliance_space_logical/efficiency, volume_space_savings, fc_port_latency
- Add 4 troubleshooting entries: auth, connection, TLS, HTTP 403
Align powerstore.conf with the established pattern used by scaleio and
other collectors. Remove verbose inline documentation, priority field,
and DURATION notation that could confuse users.
When a scheduled discovery refresh (every 5th cycle) fails, continue
collecting metrics using the previously cached inventory instead of
aborting the entire collection cycle. Initial discovery must still
succeed.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/go/plugin/go.d/collector/powerstore/collect.go">

<violation number="1" location="src/go/plugin/go.d/collector/powerstore/collect.go:18">
P1: This condition uses `lastDiscoveryOK` after `discovery()` has already reset it on error, so refresh failures still abort collection instead of falling back to the previous snapshot.

(Based on your team's feedback about continuing on discovery refresh failures when a prior snapshot exists.) [FEEDBACK_USED]</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/collectors Everything related to data collection area/go area/metadata Integrations metadata collectors/go.d

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants