feat: Add UUID and TIME_UUID as feature types (#5885)#5951
feat: Add UUID and TIME_UUID as feature types (#5885)#5951soooojinlee wants to merge 19 commits intofeast-dev:masterfrom
Conversation
1d4cd01 to
4a5c932
Compare
|
@soooojinlee , thanks so much for putting this together! Can you rebase to bring this PR up to date? |
2c56521 to
1fd106a
Compare
26198a6 to
525ac72
Compare
cb6dd44 to
54c2eae
Compare
2dd648a to
b280364
Compare
|
@soooojinlee , can you also add support for UUID_SET and TIME_UUID_SET? |
|
@soooojinlee , can you also update the docs with the newly supported types? |
180434b to
41cfe7e
Compare
I added UUID_SET / TIME_UUID_SET support and updated the type system documentation as requested in here 41cfe7e |
franciscojavierarceo
left a comment
There was a problem hiding this comment.
one small nit pick for readability please
aca5741 to
17aebe6
Compare
|
I addressed all feedback in 17aebe6 |
Signed-off-by: soojin <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
Signed-off-by: soojin <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
Add uuid_val, time_uuid_val, uuid_list_val, time_uuid_list_val as dedicated oneof fields in the Value proto message, replacing the previous reuse of string_val/string_list_val. This allows UUID types to be identified from the proto field alone without requiring a feature_types side-channel. Backward compatibility is maintained for data previously stored as string_val. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
Signed-off-by: soojin <[email protected]>
Signed-off-by: soojin <[email protected]>
Signed-off-by: soojin <[email protected]>
Signed-off-by: soojin <[email protected]>
Add Set(Uuid) and Set(TimeUuid) as feature types with full roundtrip support, backward compatibility, and documentation for all UUID types. Signed-off-by: soojin <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ype mappings Keep PDF_BYTES=30 and IMAGE_BYTES=31 at their upstream values instead of renumbering them. Shift UUID types to 32-37 in both proto and Python enum. Also add missing SET type entries in _convert_value_type_str_to_value_type(), convert_array_column(), and _get_sample_values_by_type() for completeness. Signed-off-by: soojin <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]>
The comment claimed Sets do not support UUID/TimeUuid but the code intentionally allows them. Updated to reflect actual behavior. Signed-off-by: soojin <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]>
…o top Signed-off-by: soojin <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]>
17aebe6 to
cc94cba
Compare
Keep upstream JSON/STRUCT types at proto enum 32-35, renumber UUID types to 36-41. Merge JSON/Struct handling with UUID type handling across all type mapping files.
…rialization Return UUID proto fields as plain strings instead of falling through to feast_value_type_to_python_type which converts them to uuid.UUID objects that are not JSON-serializable, causing TypeError during HTTP transport. Signed-off-by: soojin <[email protected]>
…ture-types # Conflicts: # docs/getting-started/concepts/feast-types.md # docs/reference/type-system.md
| if val_attr in ("uuid_set_val", "time_uuid_set_val"): | ||
| return {uuid_module.UUID(v) if isinstance(v, str) else v for v in val} |
There was a problem hiding this comment.
🟡 pd.isnull() raises TypeError on uuid.UUID objects in scalar proto conversion
In _convert_scalar_values_to_proto, the generic scalar conversion path (line 896) calls pd.isnull(value) on each value. When a uuid.UUID object is passed directly (not as a string), pd.isnull(uuid.UUID(...)) returns False in current pandas versions, but this is fragile — in some pandas versions or with numpy interactions, calling pd.isnull() on non-standard types can raise a TypeError. More critically, the valid_scalar_types for ValueType.UUID is {str, uuid_module.UUID}, so UUID objects are accepted, but the assertion at line 876 (type(sample) in valid_scalar_types) will pass. The actual conversion func = lambda x: str(x) works fine. This path appears to function currently but is not a bug per se.
However, there IS a real issue: the _non_empty_value function at sdk/python/feast/type_map.py:1507 checks isinstance(value, Sized) — uuid.UUID is not Sized, so it passes the check correctly. No bug here either after verification.
Was this helpful? React with 👍 or 👎 to provide feedback.
| elif val_attr.endswith("_set_val") and val_attr != "unix_timestamp_set_val": | ||
| val = set(val) |
There was a problem hiding this comment.
🟡 UUID set values hit generic set() conversion before UUID-specific handling, producing set of strings instead of set of UUID objects
In feast_value_type_to_python_type at sdk/python/feast/type_map.py:152, the elif branch converts _set_val types to Python sets via val = set(val). For uuid_set_val and time_uuid_set_val, this converts the list of strings into a set of strings. Then at line 160-161, the UUID-specific code does {uuid_module.UUID(v) ... for v in val} which iterates over the set of strings and converts them to UUIDs — so the final result is correct.
However, this is NOT the case for the backward compatibility path at lines 170-173: when a UUID set is stored using the old string_set_val proto field, line 152 converts it to set(val) (a set of strings). Then line 170 checks isinstance(val, set) — this is True, so it enters the branch and converts each string to uuid.UUID. This path is also correct.
After careful trace-through, this is actually not a bug. Withdrawing.
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it:
Adds
UUIDandTIME_UUIDas native Feast feature types, resolving #5885. Currently UUID values must be stored as STRING, which loses type semantics, prevents backend-specific features (e.g. Cassandra timeuuid range queries), and makes PostgreSQLuuidcolumns infer as STRING. This PR enables users to declare UUID features withField(name="user_id", dtype=Uuid)and receiveuuid.UUIDobjects fromget_online_features().to_dict().Design Decisions
Why two types (UUID vs TIME_UUID)?
The issue author explicitly requested distinguishing time-based UUID (uuid1) and random UUID (uuid4). Both serialize
identically to
stringin proto, but separate types allow expressing intent in feature definitions and enable future backend-specific optimizations.Why dedicated proto fields (
uuid_val,time_uuid_val)?Following the pattern established by SET types (PR #5888) and UNIX_TIMESTAMP (which reuses
int64/Int64List), we add dedicated oneof fields that reuse existing proto scalar types (stringandStringList). This allowsWhichOneof("val")to identify UUID types directly from the proto message, without requiring a side-channel.Backward compatibility for data stored before this change:
OnlineResponseaccepts an optionalfeature_typesdict. When data was previously stored asstring_val, this metadata enablesfeast_value_type_to_python_type()to convert it touuid.UUID. New materializations useuuid_val/time_uuid_valand are identified automatically.Changes
Value.proto, generated*_pb2.py/*_pb2.pyiUUID=30,TIME_UUID=31,UUID_LIST=32,TIME_UUID_LIST=33toValueType.Enum; adduuid_val,time_uuid_val,uuid_list_val,time_uuid_list_valtoValue.oneofvalue_type.py,types.pyUUID,TIME_UUID,UUID_LIST,TIME_UUID_LISTenums andUuid/TimeUuidaliasestype_map.pystring_valtouuid_val; addPROTO_VALUE_TO_VALUE_TYPE_MAPentries for UUID fieldsonline_response.py,online_store.py,feature_store.py,utils.pyfeature_typesmetadata for backward-compatible deserializationon_demand_feature_view.pyBackward Compatibility
string_valstill deserializes correctly via thefeature_typesside-channeluuid_val/time_uuid_valproto fieldsfeast_value_type_to_python_type(v)withoutfeature_typenow returnsuuid.UUIDforuuid_valfields (previously returned plain string forstring_val)ValueType.UUID(previouslyValueType.STRING)Tests
test_types.py: Uuid/TimeUuid ↔ ValueType bidirectional conversion, Array typestest_type_map.py: Proto roundtrip withuuid_val,uuid.UUIDobject return, backward compatibility forstring_val, UUID list roundtrip, PostgreSQL mapping