feat: Support nested collection types (Array/Set of Array/Set) (#5947)#6132
feat: Support nested collection types (Array/Set of Array/Set) (#5947)#6132soooojinlee wants to merge 4 commits intofeast-dev:masterfrom
Conversation
…-dev#5947) Add support for 2-level nested collection types: Array(Array(T)), Array(Set(T)), Set(Array(T)), and Set(Set(T)). - Add 4 generic ValueType enums (LIST_LIST, LIST_SET, SET_LIST, SET_SET) backed by RepeatedValue proto messages - Persist inner type info in Field tags (feast:nested_inner_type), following the existing Struct schema tag pattern - Handle edge cases: empty inner collections, Set dedup at inner level, depth limit enforcement (2 levels max) - Add proto/JSON/remote transport serialization support - Add 25 unit tests covering all combinations and edge cases Signed-off-by: Soojin Lee <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
- Fix remote online store read path to use declared feature types from FeatureView instead of ValueType.UNKNOWN, which fails for nested collection types (LIST_LIST, LIST_SET, SET_LIST, SET_SET) - Add Nested Collection Types section to type-system.md with type table, usage examples, and empty-inner-collection→None limitation docs Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
…for nested collection types - Add nested list handling in proto_json from_json_object (list of lists was raising ParseError since no branch matched list-typed elements) - Fix pa_to_feast_value_type to recognize nested list PyArrow types (list<item: list<item: T>>) instead of crashing with KeyError - Replace silent String fallback in _str_to_feast_type with ValueError to surface corrupted tag values instead of silently losing type info - Strengthen test coverage: type str roundtrip, inner value verification, multi-value batch, proto JSON roundtrip, PyArrow nested type inference Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
Use getattr/CopyFrom instead of **dict unpacking for ProtoValue construction to satisfy mypy's strict type checking. Signed-off-by: soojin <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
There was a problem hiding this comment.
🔴 ComplexFeastType.eq treats all nested collection types with same outer shape as equal regardless of inner element type
All nested collection types that share the same outer collection shape (e.g., Array(Array(String)) vs Array(Array(Int32))) are considered equal by ComplexFeastType.__eq__ because it only compares self.to_value_type() == other.to_value_type(), and both map to ValueType.LIST_LIST. This was confirmed experimentally. The same issue affects __hash__.
This causes Field.__eq__ (sdk/python/feast/field.py:86-95) to consider two fields with different nested inner types as identical, which means schema changes (e.g., changing a field from Array(Array(String)) to Array(Array(Int32))) would not be detected during feast apply. The registry diff logic at sdk/python/feast/diff/registry_diff.py:144 uses object equality to determine if an update occurred, so this bug silently drops schema changes for nested collection types.
(Refers to lines 63-67)
Prompt for agents
In sdk/python/feast/types.py, the ComplexFeastType.__eq__ method (lines 63-67) compares types solely via to_value_type(). For nested collection types (Array(Array(T)), Array(Set(T)), Set(Array(T)), Set(Set(T))), this loses the inner element type, so Array(Array(String)) == Array(Array(Int32)) returns True.
The fix should override __eq__ and __hash__ on Array and Set classes to also compare the base_type recursively. For example:
In the Array class (around line 158), add:
def __eq__(self, other):
if isinstance(other, Array):
return self.base_type == other.base_type
return False
def __hash__(self):
return hash(("Array", hash(self.base_type)))
Similarly for the Set class (around line 203), add:
def __eq__(self, other):
if isinstance(other, Set):
return self.base_type == other.base_type
return False
def __hash__(self):
return hash(("Set", hash(self.base_type)))
This ensures that Array(Array(String)) != Array(Array(Int32)) while keeping Array(String) == Array(String) working correctly.
Was this helpful? React with 👍 or 👎 to provide feedback.
| elif isinstance(value[0], list): | ||
| # Nested collection (list of lists). | ||
| # Default to list_list_val since JSON transport loses the | ||
| # outer/inner set distinction. | ||
| rv = RepeatedValue() | ||
| for inner in value: | ||
| inner_val = rv.val.add() | ||
| from_json_object(parser, inner, inner_val) | ||
| message.list_list_val.CopyFrom(rv) |
There was a problem hiding this comment.
🟡 JSON deserialization of nested collections fails when first inner element is None
In the from_json_object function, nested collections are detected by checking isinstance(value[0], list) at line 95. If a nested collection's first inner element is None (e.g., [None, [1, 2], [3]] — which is how an empty inner collection round-trips through proto as documented), this check returns False. The value then falls through to isinstance(value[0], (float, int, type(None))) at line 108 (True for None), which tries to process the entire list as a flat numeric array. When it attempts message.double_list_val.val.extend(...) with elements like [1, 2], a TypeError is raised because a list cannot be converted to a double.
This affects the feature server HTTP JSON API transport whenever a nested collection feature has None as its first inner element.
| elif isinstance(value[0], list): | |
| # Nested collection (list of lists). | |
| # Default to list_list_val since JSON transport loses the | |
| # outer/inner set distinction. | |
| rv = RepeatedValue() | |
| for inner in value: | |
| inner_val = rv.val.add() | |
| from_json_object(parser, inner, inner_val) | |
| message.list_list_val.CopyFrom(rv) | |
| elif isinstance(value[0], list) or ( | |
| any(isinstance(v, list) for v in value) | |
| ): | |
| # Nested collection (list of lists). | |
| # Default to list_list_val since JSON transport loses the | |
| # outer/inner set distinction. | |
| rv = RepeatedValue() | |
| for inner in value: | |
| inner_val = rv.val.add() | |
| from_json_object(parser, inner, inner_val) | |
| message.list_list_val.CopyFrom(rv) |
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it:
Add 2-level nested collection types:
Array(Array(T)),Array(Set(T)),Set(Array(T)),Set(Set(T))Changes
LIST_LIST/LIST_SET/SET_LIST/SET_SET) withRepeatedValuefieldsArray/Setmutual nesting with 2-level depth limit, preserve inner type viafeast:nested_inner_typetagfeature_type_mapinstead ofValueType.UNKNOWNStringfallback in_str_to_feast_typewithValueErrorWhich issue(s) this PR fixes:
Fixes #5947
Misc
The UUID type feature PR was developed first and uses ValueType enum values 36-41. This PR uses 36-39 for nested collection types. If this PR is merged first, the UUID branch will need to reassign its enum values to avoid conflicts.