DE-2447: ModelPack decoder schema + coffeecup parity coverage by sebastient · Pull Request #82 · EdgeFirstAI/hal

sebastient · 2026-05-26T20:10:34Z

Summary

Adds end-to-end parity coverage for HAL's ModelPack anchor-grid decoder against the canonical ModelPack Python reference, on a real 270x480 production detection model (coffeecup-mpk-det-relu-t-27d6).
Adds schema-build coverage for ModelPack 4.0 layouts the HAL hadn't been exercised against (det/seg/multitask logical, det smart) via four synthetic fixtures.

What's included

Parity tests — crates/decoder/tests/modelpack_coffeecup_parity.rs runs Decoder::decode on raw outputs captured from both the TFLite int8 smart export and the ONNX float export, then asserts the post-NMS detections match the Python reference embedded in each fixture. Tolerances: IoU >= 0.95 + score within 0.02 (int8), IoU >= 0.99 + score within 0.001 (float).
Schema tests — crates/decoder/tests/modelpack_decoder_schemas.rs asserts SchemaV2::parse_file + DecoderBuilder::with_schema(...).build() succeeds on four ModelPack 4.0 schema shapes (det/seg/multitask logical, det smart) with the expected output topology.
Fixture generator — scripts/decoder_generate_modelpack_fixture.py runs inference on either a TFLite (int8) or ONNX (float) ModelPack model, executes the canonical reference decode in NumPy, and packs raw + intermediate + reference into a .safetensors consumable by the existing PerScaleFixture loader.
.gitignore — adds *.onnx alongside the existing *.tflite rule so source models stay local-only (only the generated fixtures are committed, via LFS).

Files

crates/decoder/tests/modelpack_coffeecup_parity.rs (new, +163)
crates/decoder/tests/modelpack_decoder_schemas.rs (new, +156)
scripts/decoder_generate_modelpack_fixture.py (new, +492)
testdata/decoder/modelpack_{det_logical,det_smart,seg_logical,multitask_logical}.json (new schema fixtures)
testdata/decoder/coffeecup-mpk-det-relu-t-27d6{,_quant-u8-i8_smart}.safetensors (new, LFS-tracked, ~947 KB total)
.gitignore (+*.onnx)

Why this matters

The existing modelpack tests only exercised synthetic 320x320 schemas. This change is the first proof that HAL's ModelPack runtime (dequant + sigmoid + anchor-grid decode + class-agnostic NMS) reproduces the canonical ModelPack reference on a real model with non-square input and per-tensor int8 quantization.

Test plan

cargo test -p edgefirst-decoder --test modelpack_coffeecup_parity — 2/2 pass
cargo test -p edgefirst-decoder --test modelpack_decoder_schemas — 4/4 pass
cargo test -p edgefirst-decoder — full suite 480/480 (no regressions)
make format lint check clean

Regenerating the fixtures

source venv/bin/activate
python scripts/decoder_generate_modelpack_fixture.py \
  coffeecup-mpk-det-relu-t-27d6_quant-u8-i8_smart.tflite \
  testdata/coffeecup.jpg \
  --output testdata/decoder/coffeecup-mpk-det-relu-t-27d6_quant-u8-i8_smart.safetensors

python scripts/decoder_generate_modelpack_fixture.py \
  coffeecup-mpk-det-relu-t-27d6.onnx \
  testdata/coffeecup.jpg \
  --output testdata/decoder/coffeecup-mpk-det-relu-t-27d6.safetensors

The source .tflite/.onnx are intentionally not committed; they live at the repo root for reviewers who want to reproduce locally.

Validate HAL's ModelPack anchor-grid runtime end-to-end against real production models on a non-square 270x480 input. * Four synthetic schema fixtures (det/seg/multitask logical + det smart) with build-time tests asserting SchemaV2 parse and DecoderBuilder build for ModelPack 4.0 schemas. * Two coffeecup parity fixtures (TFLite int8 smart + ONNX float) bundle raw model outputs, schema, and reference post-NMS detections from the canonical ModelPack Python decoder. Strict-parity tests assert HAL matches the reference: IoU>=0.95 + score within 0.02 (int8), IoU>=0.99 + score within 0.001 (float). * New scripts/decoder_generate_modelpack_fixture.py regenerates both fixtures from source TFLite/ONNX models. * .gitignore: add *.onnx alongside *.tflite to keep source models local-only. Signed-off-by: Sébastien Taylor <[email protected]>

Copilot

Pull request overview

Adds ModelPack decoder coverage to edgefirst-decoder by introducing parity tests against a canonical Python reference (via LFS-tracked .safetensors fixtures) and adding schema-build tests that exercise additional ModelPack 4.0 output layouts.

Changes:

Add end-to-end parity tests for a real non-square ModelPack detection model using embedded reference outputs in .safetensors fixtures.
Add schema parsing/build tests plus four synthetic ModelPack v2 schema fixtures (det/seg/multitask logical, det smart).
Add a Python fixture generator script and update .gitignore to keep source .tflite/.onnx models local-only.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
crates/decoder/tests/modelpack_coffeecup_parity.rs	Runs HAL decode on coffeecup fixtures and asserts parity vs embedded reference detections
crates/decoder/tests/modelpack_decoder_schemas.rs	Validates SchemaV2 parse + DecoderBuilder build for four ModelPack schema fixtures
scripts/decoder_generate_modelpack_fixture.py	Generates `.safetensors` fixtures by running inference + NumPy reference decode/NMS
testdata/decoder/modelpack_det_logical.json	Synthetic ModelPack detection logical schema fixture
testdata/decoder/modelpack_det_smart.json	Synthetic ModelPack detection “smart” (quantized, no nested outputs) schema fixture
testdata/decoder/modelpack_seg_logical.json	Synthetic ModelPack segmentation logical schema fixture
testdata/decoder/modelpack_multitask_logical.json	Synthetic ModelPack multitask (det+seg) logical schema fixture
testdata/decoder/coffeecup-mpk-det-relu-t-27d6.safetensors	LFS fixture for float ONNX export outputs + reference decode
testdata/decoder/coffeecup-mpk-det-relu-t-27d6_quant-u8-i8_smart.safetensors	LFS fixture for quantized TFLite export outputs + reference decode
.gitignore	Ignores `.onnx` alongside `.tflite` to keep source models out of the repo

sebastient · 2026-05-26T20:30:59Z

+def infer_tflite(model_path: Path, image_uint8_nhwc: np.ndarray) -> dict[tuple[int, ...], np.ndarray]:
+    """Run TFLite inference; return {shape: raw_int8_tensor}.
+
+    Binds outputs by shape (TFLite output names like ``PartitionedCall:0``
+    don't map to schema names like ``output_0``).
+    """
+    try:
+        from tflite_runtime.interpreter import Interpreter
+    except ImportError:
+        from tensorflow.lite.python.interpreter import Interpreter
+    interp = Interpreter(model_path=str(model_path))
+    interp.allocate_tensors()
+    in_det = interp.get_input_details()[0]
+    interp.set_tensor(in_det["index"], image_uint8_nhwc)
+    interp.invoke()
+    raw = {}
+    quants = {}
+    for od in interp.get_output_details():
+        t = interp.get_tensor(od["index"])
+        raw[tuple(int(x) for x in t.shape)] = t
+        quants[tuple(int(x) for x in t.shape)] = od["quantization"]
+    return raw, quants


Dropped the unused second return value from infer_tflite; type annotation now matches actual return.

sebastient · 2026-05-26T20:31:01Z

+def infer_onnx(model_path: Path, image_uint8_nhwc: np.ndarray) -> dict[tuple[int, ...], np.ndarray]:
+    """Run ONNX inference; return {shape: float32_tensor} in NHWC layout.
+
+    ModelPack ONNX takes NCHW float32 input but the heads emit NHWC outputs.
+    """
+    import onnxruntime as ort
+    sess = ort.InferenceSession(str(model_path), providers=["CPUExecutionProvider"])
+    in_meta = sess.get_inputs()[0]
+    # NCHW float32 [1, 3, H, W]; normalize uint8/255.0
+    img_nchw = (image_uint8_nhwc.astype(np.float32) / 255.0).transpose(0, 3, 1, 2)
+    outs = sess.run(None, {in_meta.name: img_nchw})
+    raw = {tuple(int(x) for x in t.shape): t for t in outs}
+    return raw, {}


Same fix as #3306578774 — infer_onnx now returns only the raw dict.

sebastient · 2026-05-26T20:31:02Z

+        "mode": "class_agnostic",
+        "score_threshold": cfg.score_threshold,
+        "iou_threshold": cfg.iou_threshold,
+        "max_output": cfg.max_output,


Renamed JSON key to max_detections so PerScaleFixture::load() picks up the fixture-time cap. Regenerated both safetensors.

sebastient · 2026-05-26T20:31:04Z

+    let nms = fix.nms_config();
+    let decoder = DecoderBuilder::default()
+        .with_schema(schema)
+        .with_iou_threshold(nms.iou_threshold)
+        .with_score_threshold(nms.score_threshold)
+        .build()
+        .expect("build decoder");


Added .with_max_det(nms.max_detections as usize) (and sized the Vec capacity from it) so the fixture's NMS settings flow into the decoder.

github-actions · 2026-05-26T20:25:59Z

Test Results (x86_64)

162 tests ±0 150 ✅ ±0 1m 24s ⏱️ +4s
1 suites ±0 12 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit c6a147b. ± Comparison against base commit 3363557.

♻️ This comment has been updated with latest results.

github-actions · 2026-05-26T20:26:32Z

Test Results (aarch64)

1 266 tests +6 1 254 ✅ +6 41s ⏱️ ±0s
2 suites ±0 12 💤 ±0
2 files ±0 0 ❌ ±0

Results for commit c6a147b. ± Comparison against base commit 3363557.

♻️ This comment has been updated with latest results.

Four review comments, all valid: * infer_tflite / infer_onnx returned (raw, _unused) tuples but were annotated as returning a single dict. Drop the unused second return value so callers and type checkers agree. * Fixture metadata wrote nms.max_output, but PerScaleFixture::load reads max_detections — so the loader silently ignored the fixture-time cap. Rename the key. * The parity test read nms.max_detections but never threaded it through DecoderBuilder. Add with_max_det(nms.max_detections) and size the box buffer from it so the test honours the fixture config. Regenerated both coffeecup fixtures to pick up the renamed JSON key. Signed-off-by: Sébastien Taylor <[email protected]>

sonarqubecloud · 2026-05-26T20:58:13Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
97.4% Coverage on New Code
6.8% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings May 26, 2026 20:10

Copilot started reviewing on behalf of sebastient May 26, 2026 20:10 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

sebastient merged commit fe0a875 into main May 26, 2026
15 checks passed

sebastient deleted the feature/DE-2447-modelpack branch May 26, 2026 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DE-2447: ModelPack decoder schema + coffeecup parity coverage#82

DE-2447: ModelPack decoder schema + coffeecup parity coverage#82
sebastient merged 2 commits into
mainfrom
feature/DE-2447-modelpack

sebastient commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

sebastient May 26, 2026

Uh oh!

sebastient May 26, 2026

Uh oh!

sebastient May 26, 2026

Uh oh!

sebastient May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sebastient commented May 26, 2026

Summary

What's included

Files

Why this matters

Test plan

Regenerating the fixtures

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

sebastient May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sebastient May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sebastient May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sebastient May 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results (x86_64)

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results (aarch64)

Uh oh!

sonarqubecloud Bot commented May 26, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading