Skip to content

DE-2447: ModelPack decoder schema + coffeecup parity coverage#82

Merged
sebastient merged 2 commits into
mainfrom
feature/DE-2447-modelpack
May 26, 2026
Merged

DE-2447: ModelPack decoder schema + coffeecup parity coverage#82
sebastient merged 2 commits into
mainfrom
feature/DE-2447-modelpack

Conversation

@sebastient
Copy link
Copy Markdown
Contributor

Summary

  • Adds end-to-end parity coverage for HAL's ModelPack anchor-grid decoder against the canonical ModelPack Python reference, on a real 270x480 production detection model (coffeecup-mpk-det-relu-t-27d6).
  • Adds schema-build coverage for ModelPack 4.0 layouts the HAL hadn't been exercised against (det/seg/multitask logical, det smart) via four synthetic fixtures.

What's included

  • Parity testscrates/decoder/tests/modelpack_coffeecup_parity.rs runs Decoder::decode on raw outputs captured from both the TFLite int8 smart export and the ONNX float export, then asserts the post-NMS detections match the Python reference embedded in each fixture. Tolerances: IoU >= 0.95 + score within 0.02 (int8), IoU >= 0.99 + score within 0.001 (float).
  • Schema testscrates/decoder/tests/modelpack_decoder_schemas.rs asserts SchemaV2::parse_file + DecoderBuilder::with_schema(...).build() succeeds on four ModelPack 4.0 schema shapes (det/seg/multitask logical, det smart) with the expected output topology.
  • Fixture generatorscripts/decoder_generate_modelpack_fixture.py runs inference on either a TFLite (int8) or ONNX (float) ModelPack model, executes the canonical reference decode in NumPy, and packs raw + intermediate + reference into a .safetensors consumable by the existing PerScaleFixture loader.
  • .gitignore — adds *.onnx alongside the existing *.tflite rule so source models stay local-only (only the generated fixtures are committed, via LFS).

Files

  • crates/decoder/tests/modelpack_coffeecup_parity.rs (new, +163)
  • crates/decoder/tests/modelpack_decoder_schemas.rs (new, +156)
  • scripts/decoder_generate_modelpack_fixture.py (new, +492)
  • testdata/decoder/modelpack_{det_logical,det_smart,seg_logical,multitask_logical}.json (new schema fixtures)
  • testdata/decoder/coffeecup-mpk-det-relu-t-27d6{,_quant-u8-i8_smart}.safetensors (new, LFS-tracked, ~947 KB total)
  • .gitignore (+*.onnx)

Why this matters

The existing modelpack tests only exercised synthetic 320x320 schemas. This change is the first proof that HAL's ModelPack runtime (dequant + sigmoid + anchor-grid decode + class-agnostic NMS) reproduces the canonical ModelPack reference on a real model with non-square input and per-tensor int8 quantization.

Test plan

  • cargo test -p edgefirst-decoder --test modelpack_coffeecup_parity — 2/2 pass
  • cargo test -p edgefirst-decoder --test modelpack_decoder_schemas — 4/4 pass
  • cargo test -p edgefirst-decoder — full suite 480/480 (no regressions)
  • make format lint check clean

Regenerating the fixtures

source venv/bin/activate
python scripts/decoder_generate_modelpack_fixture.py \
  coffeecup-mpk-det-relu-t-27d6_quant-u8-i8_smart.tflite \
  testdata/coffeecup.jpg \
  --output testdata/decoder/coffeecup-mpk-det-relu-t-27d6_quant-u8-i8_smart.safetensors

python scripts/decoder_generate_modelpack_fixture.py \
  coffeecup-mpk-det-relu-t-27d6.onnx \
  testdata/coffeecup.jpg \
  --output testdata/decoder/coffeecup-mpk-det-relu-t-27d6.safetensors

The source .tflite/.onnx are intentionally not committed; they live at the repo root for reviewers who want to reproduce locally.

Validate HAL's ModelPack anchor-grid runtime end-to-end against real
production models on a non-square 270x480 input.

* Four synthetic schema fixtures (det/seg/multitask logical + det smart)
  with build-time tests asserting SchemaV2 parse and DecoderBuilder
  build for ModelPack 4.0 schemas.
* Two coffeecup parity fixtures (TFLite int8 smart + ONNX float) bundle
  raw model outputs, schema, and reference post-NMS detections from the
  canonical ModelPack Python decoder. Strict-parity tests assert HAL
  matches the reference: IoU>=0.95 + score within 0.02 (int8),
  IoU>=0.99 + score within 0.001 (float).
* New scripts/decoder_generate_modelpack_fixture.py regenerates both
  fixtures from source TFLite/ONNX models.
* .gitignore: add *.onnx alongside *.tflite to keep source models
  local-only.

Signed-off-by: Sébastien Taylor <[email protected]>
Copilot AI review requested due to automatic review settings May 26, 2026 20:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ModelPack decoder coverage to edgefirst-decoder by introducing parity tests against a canonical Python reference (via LFS-tracked .safetensors fixtures) and adding schema-build tests that exercise additional ModelPack 4.0 output layouts.

Changes:

  • Add end-to-end parity tests for a real non-square ModelPack detection model using embedded reference outputs in .safetensors fixtures.
  • Add schema parsing/build tests plus four synthetic ModelPack v2 schema fixtures (det/seg/multitask logical, det smart).
  • Add a Python fixture generator script and update .gitignore to keep source .tflite/.onnx models local-only.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crates/decoder/tests/modelpack_coffeecup_parity.rs Runs HAL decode on coffeecup fixtures and asserts parity vs embedded reference detections
crates/decoder/tests/modelpack_decoder_schemas.rs Validates SchemaV2 parse + DecoderBuilder build for four ModelPack schema fixtures
scripts/decoder_generate_modelpack_fixture.py Generates .safetensors fixtures by running inference + NumPy reference decode/NMS
testdata/decoder/modelpack_det_logical.json Synthetic ModelPack detection logical schema fixture
testdata/decoder/modelpack_det_smart.json Synthetic ModelPack detection “smart” (quantized, no nested outputs) schema fixture
testdata/decoder/modelpack_seg_logical.json Synthetic ModelPack segmentation logical schema fixture
testdata/decoder/modelpack_multitask_logical.json Synthetic ModelPack multitask (det+seg) logical schema fixture
testdata/decoder/coffeecup-mpk-det-relu-t-27d6.safetensors LFS fixture for float ONNX export outputs + reference decode
testdata/decoder/coffeecup-mpk-det-relu-t-27d6_quant-u8-i8_smart.safetensors LFS fixture for quantized TFLite export outputs + reference decode
.gitignore Ignores *.onnx alongside *.tflite to keep source models out of the repo

Comment on lines +163 to +184
def infer_tflite(model_path: Path, image_uint8_nhwc: np.ndarray) -> dict[tuple[int, ...], np.ndarray]:
"""Run TFLite inference; return {shape: raw_int8_tensor}.

Binds outputs by shape (TFLite output names like ``PartitionedCall:0``
don't map to schema names like ``output_0``).
"""
try:
from tflite_runtime.interpreter import Interpreter
except ImportError:
from tensorflow.lite.python.interpreter import Interpreter
interp = Interpreter(model_path=str(model_path))
interp.allocate_tensors()
in_det = interp.get_input_details()[0]
interp.set_tensor(in_det["index"], image_uint8_nhwc)
interp.invoke()
raw = {}
quants = {}
for od in interp.get_output_details():
t = interp.get_tensor(od["index"])
raw[tuple(int(x) for x in t.shape)] = t
quants[tuple(int(x) for x in t.shape)] = od["quantization"]
return raw, quants
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped the unused second return value from infer_tflite; type annotation now matches actual return.

Comment on lines +187 to +199
def infer_onnx(model_path: Path, image_uint8_nhwc: np.ndarray) -> dict[tuple[int, ...], np.ndarray]:
"""Run ONNX inference; return {shape: float32_tensor} in NHWC layout.

ModelPack ONNX takes NCHW float32 input but the heads emit NHWC outputs.
"""
import onnxruntime as ort
sess = ort.InferenceSession(str(model_path), providers=["CPUExecutionProvider"])
in_meta = sess.get_inputs()[0]
# NCHW float32 [1, 3, H, W]; normalize uint8/255.0
img_nchw = (image_uint8_nhwc.astype(np.float32) / 255.0).transpose(0, 3, 1, 2)
outs = sess.run(None, {in_meta.name: img_nchw})
raw = {tuple(int(x) for x in t.shape): t for t in outs}
return raw, {}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fix as #3306578774 — infer_onnx now returns only the raw dict.

"mode": "class_agnostic",
"score_threshold": cfg.score_threshold,
"iou_threshold": cfg.iou_threshold,
"max_output": cfg.max_output,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed JSON key to max_detections so PerScaleFixture::load() picks up the fixture-time cap. Regenerated both safetensors.

Comment on lines +64 to +70
let nms = fix.nms_config();
let decoder = DecoderBuilder::default()
.with_schema(schema)
.with_iou_threshold(nms.iou_threshold)
.with_score_threshold(nms.score_threshold)
.build()
.expect("build decoder");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added .with_max_det(nms.max_detections as usize) (and sized the Vec capacity from it) so the fixture's NMS settings flow into the decoder.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

Test Results (x86_64)

162 tests  ±0   150 ✅ ±0   1m 24s ⏱️ +4s
  1 suites ±0    12 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit c6a147b. ± Comparison against base commit 3363557.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

Test Results (aarch64)

1 266 tests  +6   1 254 ✅ +6   41s ⏱️ ±0s
    2 suites ±0      12 💤 ±0 
    2 files   ±0       0 ❌ ±0 

Results for commit c6a147b. ± Comparison against base commit 3363557.

♻️ This comment has been updated with latest results.

Four review comments, all valid:

* infer_tflite / infer_onnx returned (raw, _unused) tuples but were
  annotated as returning a single dict. Drop the unused second return
  value so callers and type checkers agree.
* Fixture metadata wrote nms.max_output, but PerScaleFixture::load
  reads max_detections — so the loader silently ignored the
  fixture-time cap. Rename the key.
* The parity test read nms.max_detections but never threaded it
  through DecoderBuilder. Add with_max_det(nms.max_detections) and
  size the box buffer from it so the test honours the fixture config.

Regenerated both coffeecup fixtures to pick up the renamed JSON key.

Signed-off-by: Sébastien Taylor <[email protected]>
@sonarqubecloud
Copy link
Copy Markdown

@sebastient sebastient merged commit fe0a875 into main May 26, 2026
15 checks passed
@sebastient sebastient deleted the feature/DE-2447-modelpack branch May 26, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants