Skip to content

feat: Support nested collection types (Array/Set of Array/Set) (#5947)#6132

Open
soooojinlee wants to merge 4 commits intofeast-dev:masterfrom
soooojinlee:feat/support-nested-collection-types
Open

feat: Support nested collection types (Array/Set of Array/Set) (#5947)#6132
soooojinlee wants to merge 4 commits intofeast-dev:masterfrom
soooojinlee:feat/support-nested-collection-types

Conversation

@soooojinlee
Copy link

@soooojinlee soooojinlee commented Mar 19, 2026

What this PR does / why we need it:

Add 2-level nested collection types: Array(Array(T)), Array(Set(T)), Set(Array(T)), Set(Set(T))

Changes

  • Proto: 4 new ValueType enums (LIST_LIST/LIST_SET/SET_LIST/SET_SET) with RepeatedValue fields
  • Type system: Allow Array/Set mutual nesting with 2-level depth limit, preserve inner type via feast:nested_inner_type tag
  • Serialization: proto conversion, JSON deserialization, PyArrow schema inference support
  • Remote online store: pass correct ValueType via feature_type_map instead of ValueType.UNKNOWN
  • Replace silent String fallback in _str_to_feast_type with ValueError
  • Fix mypy type error in nested collection proto construction

Which issue(s) this PR fixes:

Fixes #5947

Misc

The UUID type feature PR was developed first and uses ValueType enum values 36-41. This PR uses 36-39 for nested collection types. If this PR is merged first, the UUID branch will need to reassign its enum values to avoid conflicts.


Open with Devin

soooojinlee and others added 4 commits March 19, 2026 23:21
…-dev#5947)

Add support for 2-level nested collection types: Array(Array(T)),
Array(Set(T)), Set(Array(T)), and Set(Set(T)).

- Add 4 generic ValueType enums (LIST_LIST, LIST_SET, SET_LIST, SET_SET)
  backed by RepeatedValue proto messages
- Persist inner type info in Field tags (feast:nested_inner_type),
  following the existing Struct schema tag pattern
- Handle edge cases: empty inner collections, Set dedup at inner level,
  depth limit enforcement (2 levels max)
- Add proto/JSON/remote transport serialization support
- Add 25 unit tests covering all combinations and edge cases

Signed-off-by: Soojin Lee <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: soojin <[email protected]>
- Fix remote online store read path to use declared feature types from
  FeatureView instead of ValueType.UNKNOWN, which fails for nested
  collection types (LIST_LIST, LIST_SET, SET_LIST, SET_SET)
- Add Nested Collection Types section to type-system.md with type table,
  usage examples, and empty-inner-collection→None limitation docs

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: soojin <[email protected]>
…for nested collection types

- Add nested list handling in proto_json from_json_object (list of lists
  was raising ParseError since no branch matched list-typed elements)
- Fix pa_to_feast_value_type to recognize nested list PyArrow types
  (list<item: list<item: T>>) instead of crashing with KeyError
- Replace silent String fallback in _str_to_feast_type with ValueError
  to surface corrupted tag values instead of silently losing type info
- Strengthen test coverage: type str roundtrip, inner value verification,
  multi-value batch, proto JSON roundtrip, PyArrow nested type inference

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: soojin <[email protected]>
Use getattr/CopyFrom instead of **dict unpacking for ProtoValue
construction to satisfy mypy's strict type checking.

Signed-off-by: soojin <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: soojin <[email protected]>
@soooojinlee soooojinlee requested a review from a team as a code owner March 19, 2026 14:30
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 ComplexFeastType.eq treats all nested collection types with same outer shape as equal regardless of inner element type

All nested collection types that share the same outer collection shape (e.g., Array(Array(String)) vs Array(Array(Int32))) are considered equal by ComplexFeastType.__eq__ because it only compares self.to_value_type() == other.to_value_type(), and both map to ValueType.LIST_LIST. This was confirmed experimentally. The same issue affects __hash__.

This causes Field.__eq__ (sdk/python/feast/field.py:86-95) to consider two fields with different nested inner types as identical, which means schema changes (e.g., changing a field from Array(Array(String)) to Array(Array(Int32))) would not be detected during feast apply. The registry diff logic at sdk/python/feast/diff/registry_diff.py:144 uses object equality to determine if an update occurred, so this bug silently drops schema changes for nested collection types.

(Refers to lines 63-67)

Prompt for agents
In sdk/python/feast/types.py, the ComplexFeastType.__eq__ method (lines 63-67) compares types solely via to_value_type(). For nested collection types (Array(Array(T)), Array(Set(T)), Set(Array(T)), Set(Set(T))), this loses the inner element type, so Array(Array(String)) == Array(Array(Int32)) returns True.

The fix should override __eq__ and __hash__ on Array and Set classes to also compare the base_type recursively. For example:

In the Array class (around line 158), add:

    def __eq__(self, other):
        if isinstance(other, Array):
            return self.base_type == other.base_type
        return False

    def __hash__(self):
        return hash(("Array", hash(self.base_type)))

Similarly for the Set class (around line 203), add:

    def __eq__(self, other):
        if isinstance(other, Set):
            return self.base_type == other.base_type
        return False

    def __hash__(self):
        return hash(("Set", hash(self.base_type)))

This ensures that Array(Array(String)) != Array(Array(Int32)) while keeping Array(String) == Array(String) working correctly.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +95 to +103
elif isinstance(value[0], list):
# Nested collection (list of lists).
# Default to list_list_val since JSON transport loses the
# outer/inner set distinction.
rv = RepeatedValue()
for inner in value:
inner_val = rv.val.add()
from_json_object(parser, inner, inner_val)
message.list_list_val.CopyFrom(rv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 JSON deserialization of nested collections fails when first inner element is None

In the from_json_object function, nested collections are detected by checking isinstance(value[0], list) at line 95. If a nested collection's first inner element is None (e.g., [None, [1, 2], [3]] — which is how an empty inner collection round-trips through proto as documented), this check returns False. The value then falls through to isinstance(value[0], (float, int, type(None))) at line 108 (True for None), which tries to process the entire list as a flat numeric array. When it attempts message.double_list_val.val.extend(...) with elements like [1, 2], a TypeError is raised because a list cannot be converted to a double.

This affects the feature server HTTP JSON API transport whenever a nested collection feature has None as its first inner element.

Suggested change
elif isinstance(value[0], list):
# Nested collection (list of lists).
# Default to list_list_val since JSON transport loses the
# outer/inner set distinction.
rv = RepeatedValue()
for inner in value:
inner_val = rv.val.add()
from_json_object(parser, inner, inner_val)
message.list_list_val.CopyFrom(rv)
elif isinstance(value[0], list) or (
any(isinstance(v, list) for v in value)
):
# Nested collection (list of lists).
# Default to list_list_val since JSON transport loses the
# outer/inner set distinction.
rv = RepeatedValue()
for inner in value:
inner_val = rv.val.add()
from_json_object(parser, inner, inner_val)
message.list_list_val.CopyFrom(rv)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support nested set and list as types

1 participant