go.d/powerstore: add Dell PowerStore storage array collector (V2 framework)#21929
go.d/powerstore: add Dell PowerStore storage array collector (V2 framework)#21929
Conversation
Add a new Go collector for Dell PowerStore storage arrays via the PowerStore REST API. Implements a custom REST client (no external SDK) with Basic Auth, cookie-based sessions, and CSRF token handling. Monitors: cluster capacity/efficiency, appliance IOPS/bandwidth/latency/CPU, volume metrics with filtering, node performance, FC/Ethernet port metrics, hardware health (fans/PSUs/drives/batteries/nodes), active alerts by severity, drive endurance, NAS server status, and replication metrics. Key design decisions: - 30-second default polling interval (matches API Five_Mins granularity) - Discovery separated from collection (runs every 5 cycles) - Session retry on 403 Forbidden (PowerStore uses 403, not 401, for stale sessions) - Volume filtering via SimplePatternsMatcher (include/exclude patterns) - Hardware data cached during discovery to avoid duplicate API calls
There was a problem hiding this comment.
No issues found across 47 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant Netdata as Netdata Agent (go.d.plugin)
participant Coll as PowerStore Collector
participant Client as REST Client
participant Auth as CSRF & Session Cache
participant API as PowerStore REST API
Note over Netdata, API: NEW: Collector Initialization
Netdata->>Coll: Init()
Coll->>Coll: NEW: Initialize volume_selector Matcher
Coll->>Client: NEW: Create HTTP client (cookie jar enabled)
Note over Netdata, API: NEW: Collection Cycle (30s interval)
Netdata->>Coll: Collect()
opt NEW: First run OR every 5th cycle (Discovery)
Coll->>Client: discovery()
Client->>API: GET /cluster, /appliance, /volume, etc.
API-->>Client: Resource IDs and Metadata
Client-->>Coll: List of entities
Coll->>Coll: NEW: Filter volumes via volume_selector
Coll->>Coll: NEW: Cache Hardware/Node/Drive mapping
end
Note over Coll, API: NEW: Metric Gathering Flow
Coll->>Coll: collectClusterSpace()
Coll->>Coll: collectAppliances()
Coll->>Coll: collectVolumes()
Coll->>Coll: collectHardwareHealth() (uses cached discovery data)
loop For each entity (Volume, Appliance, etc.)
Coll->>Client: POST /metrics/generate (entity_id)
Note over Client, API: NEW: Auth & CSRF Management
Client->>API: Request with DELL-EMC-TOKEN header
alt Success (2xx)
API-->>Client: Timeseries data
else Session Expired (403 Forbidden)
Client->>API: NEW: POST /login_session (Basic Auth)
API-->>Client: New Session Cookie & CSRF Token
Client->>Auth: NEW: Update cached DELL-EMC-TOKEN
Client->>API: Retry original metrics request
API-->>Client: Timeseries data
end
Client-->>Coll: Metrics JSON
end
Coll->>Coll: NEW: Update dynamic charts
Coll-->>Netdata: map[string]int64 (stm.ToMap)
Note over Netdata, API: NEW: Cleanup
Netdata->>Coll: Cleanup()
Coll->>Client: Logout()
Client->>Auth: Clear CSRF state
There was a problem hiding this comment.
Pull request overview
Adds a new powerstore go.d collector module to monitor Dell PowerStore arrays via the PowerStore REST API (capacity, performance, health, alerts, NAS, replication), including config/metadata/schema and unit tests with mock API fixtures.
Changes:
- Introduces the
powerstorecollector implementation (discovery, collection, charts, REST client with session/CSRF handling). - Adds module metadata + config schema + example go.d config for PowerStore.
- Adds unit tests and mock REST API JSON fixtures for the collector.
Reviewed changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/go/plugin/go.d/config/go.d/powerstore.conf | New example go.d job configuration for PowerStore. |
| src/go/plugin/go.d/collector/init.go | Registers the new collector via blank import. |
| src/go/plugin/go.d/collector/powerstore/metrics.go | Defines the collector’s internal metrics model and stm tags. |
| src/go/plugin/go.d/collector/powerstore/metadata.yaml | Documents configuration options and exposed charts/metrics. |
| src/go/plugin/go.d/collector/powerstore/config_schema.json | JSON schema + UI schema for validating/rendering job configuration. |
| src/go/plugin/go.d/collector/powerstore/collector.go | Collector wiring, init/check/collect/cleanup, config and state. |
| src/go/plugin/go.d/collector/powerstore/collect.go | Discovery loop + orchestrates per-component collection and chart updates. |
| src/go/plugin/go.d/collector/powerstore/collect_cluster.go | Collects cluster capacity/efficiency metrics. |
| src/go/plugin/go.d/collector/powerstore/collect_appliance.go | Collects appliance performance/capacity metrics. |
| src/go/plugin/go.d/collector/powerstore/collect_volume.go | Collects per-volume performance/capacity metrics. |
| src/go/plugin/go.d/collector/powerstore/collect_node.go | Collects per-node performance/logins metrics. |
| src/go/plugin/go.d/collector/powerstore/collect_ports.go | Collects FC/Ethernet port performance + link state. |
| src/go/plugin/go.d/collector/powerstore/collect_filesystem.go | Collects file system performance metrics. |
| src/go/plugin/go.d/collector/powerstore/collect_hardware.go | Aggregates hardware lifecycle state health counts. |
| src/go/plugin/go.d/collector/powerstore/collect_alerts.go | Counts active alerts by severity. |
| src/go/plugin/go.d/collector/powerstore/collect_drives.go | Collects SSD endurance remaining metrics. |
| src/go/plugin/go.d/collector/powerstore/collect_nas.go | Counts NAS server operational status states. |
| src/go/plugin/go.d/collector/powerstore/collect_replication.go | Collects/aggregates replication (copy) metrics. |
| src/go/plugin/go.d/collector/powerstore/charts.go | Static + dynamic chart templates and dynamic chart creation. |
| src/go/plugin/go.d/collector/powerstore/client/client.go | REST client with cookie session + CSRF token caching and 403 retry login. |
| src/go/plugin/go.d/collector/powerstore/client/types.go | REST API response/request types for entities and metrics. |
| src/go/plugin/go.d/collector/powerstore/collector_test.go | Unit tests using a mock PowerStore API server. |
| src/go/plugin/go.d/collector/powerstore/testdata/cluster.json | Mock API fixture: clusters. |
| src/go/plugin/go.d/collector/powerstore/testdata/appliance.json | Mock API fixture: appliances. |
| src/go/plugin/go.d/collector/powerstore/testdata/volume.json | Mock API fixture: volumes. |
| src/go/plugin/go.d/collector/powerstore/testdata/hardware_all.json | Mock API fixture: all hardware components. |
| src/go/plugin/go.d/collector/powerstore/testdata/hardware_fan.json | Mock API fixture: fan hardware. |
| src/go/plugin/go.d/collector/powerstore/testdata/hardware_psu.json | Mock API fixture: PSU hardware. |
| src/go/plugin/go.d/collector/powerstore/testdata/hardware_drive.json | Mock API fixture: drive hardware. |
| src/go/plugin/go.d/collector/powerstore/testdata/hardware_battery.json | Mock API fixture: battery hardware. |
| src/go/plugin/go.d/collector/powerstore/testdata/hardware_node.json | Mock API fixture: node hardware. |
| src/go/plugin/go.d/collector/powerstore/testdata/alert.json | Mock API fixture: alerts. |
| src/go/plugin/go.d/collector/powerstore/testdata/fc_port.json | Mock API fixture: FC ports. |
| src/go/plugin/go.d/collector/powerstore/testdata/eth_port.json | Mock API fixture: Ethernet ports. |
| src/go/plugin/go.d/collector/powerstore/testdata/file_system.json | Mock API fixture: file systems. |
| src/go/plugin/go.d/collector/powerstore/testdata/nas_server.json | Mock API fixture: NAS servers. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_space_cluster.json | Mock API fixture: cluster space metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_space_appliance.json | Mock API fixture: appliance space metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_space_volume.json | Mock API fixture: volume space metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_appliance.json | Mock API fixture: appliance perf metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_volume.json | Mock API fixture: volume perf metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_node.json | Mock API fixture: node perf metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_fc_port.json | Mock API fixture: FC port perf metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_eth_port.json | Mock API fixture: Ethernet port perf metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_perf_filesystem.json | Mock API fixture: file system perf metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_wear_drive.json | Mock API fixture: drive wear metrics. |
| src/go/plugin/go.d/collector/powerstore/testdata/metrics_copy_appliance.json | Mock API fixture: replication/copy metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- Apply gofmt to fix struct alignment in metrics.go, client.go, types.go - Rename naServers to nasServers for clarity (Copilot review) - Add autodetection_retry and vnode to config_schema.json (required since additionalProperties is false — configs using these would fail validation) - Allow null type for headers in config_schema.json (matches other collectors)
- Add bounded concurrency (10 concurrent API calls) using semaphore pattern, matching Dell's own csm-metrics-powerstore approach - Fix GET endpoint pagination: server caps limit at 2000 (was 5000), handle HTTP 206 Partial Content for proper pagination - Replace doGetWithRetry with generic doGetAllPages that fetches all pages automatically via offset-based pagination - All entity-level collection functions (volumes, appliances, nodes, ports, filesystems, drives, replication) now run in parallel with per-entity goroutines bounded by the shared semaphore - Race detector tests pass cleanly
|
Why is this implemented using an old framework? |
|
@ilyam8 I am porting this to V2. |
|
And I will also add Dell PowerVault which is similar. |
Replace V1 chart definitions (charts.go) with YAML-based chart templates (charts.yaml) and metrix.CollectorStore for metric registration and collection. Key changes: - Use SnapshotMeter/SnapshotGaugeVec for periodic API poll metrics - Chart definitions in charts.yaml with proper float, multiplier, and algorithm settings - Atomic discovery: build temp state, swap only on full success - Remove redundant totalIops/totalBandwidth aggregates (read+write sums computed by API, not useful as separate metrics) - Add appliance logical space and efficiency ratio charts - Add FC port latency and volume thin savings charts - Stacked visualization for eth port errors - Tests updated with metrix cycle control and chart coverage
|
I think it is ok for new collectors. Old is the problem. Why we should not use it on old collectors? float is not backwards compatible with older parents. A new child connected to an old parent will have precision loss. It works, but the streaming protocol is negotiated and the parent will receive int values, not float. We agreed that we will ship a stable version capable of float without using it, and then we will switch collectors to float, so that users upgrading children before parents will not see a difference. But this is a new collector. There are no past data. And most likely they will also have a compatible parent too. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 47 out of 47 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Fill all empty sections with detailed end-user documentation: - method_description: REST API, Basic Auth, CSRF token, metrics generation API - additional_permissions, auto_detection, limits, performance_impact - Add labels to all 7 per-entity metric scopes - Add 3 missing charts: appliance_space_logical/efficiency, volume_space_savings, fc_port_latency - Add 4 troubleshooting entries: auth, connection, TLS, HTTP 403
Align powerstore.conf with the established pattern used by scaleio and other collectors. Remove verbose inline documentation, priority field, and DURATION notation that could confuse users.
When a scheduled discovery refresh (every 5th cycle) fails, continue collecting metrics using the previously cached inventory instead of aborting the entire collection cycle. Initial discovery must still succeed.
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/go/plugin/go.d/collector/powerstore/collect.go">
<violation number="1" location="src/go/plugin/go.d/collector/powerstore/collect.go:18">
P1: This condition uses `lastDiscoveryOK` after `discovery()` has already reset it on error, so refresh failures still abort collection instead of falling back to the previous snapshot.
(Based on your team's feedback about continuing on discovery refresh failures when a prior snapshot exists.) [FEEDBACK_USED]</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Summary
Add a new Go collector for Dell PowerStore storage arrays, built on the V2 collector framework (metrix.CollectorStore + YAML chart templates).
What it monitors
Key implementation details
CreateV2,MetricStore(),ChartTemplateYAML(),Collect() errorSnapshotMeter+SnapshotGaugeVecfor periodic API polls (PowerStore returns pre-computed rates)float: true,multiplier: -1,algorithm: absolutesync.WaitGroup+ semaphore (10 concurrent requests)Charts (28 total)
Cluster (3), Hardware Health (5), Alerts (1), NAS (1), Replication (2), Appliance (7), Volume (5), Node (4), FC Port (4), Ethernet Port (4), File System (3), Drive (1)
Test plan
go buildsucceedsgofmtcleango vetcleango test -race)AssertChartCoveragevalidates all chart template dimensions are materialized