Skip to content

feat: Add UUID and TIME_UUID as feature types (#5885)#5951

Open
soooojinlee wants to merge 19 commits intofeast-dev:masterfrom
soooojinlee:feat/add-uuid-feature-types
Open

feat: Add UUID and TIME_UUID as feature types (#5885)#5951
soooojinlee wants to merge 19 commits intofeast-dev:masterfrom
soooojinlee:feat/add-uuid-feature-types

Conversation

@soooojinlee
Copy link

@soooojinlee soooojinlee commented Feb 8, 2026

What this PR does / why we need it:

Adds UUID and TIME_UUID as native Feast feature types, resolving #5885. Currently UUID values must be stored as STRING, which loses type semantics, prevents backend-specific features (e.g. Cassandra timeuuid range queries), and makes PostgreSQL uuid columns infer as STRING. This PR enables users to declare UUID features with Field(name="user_id", dtype=Uuid) and receive uuid.UUID objects from get_online_features().to_dict().

Design Decisions

Why two types (UUID vs TIME_UUID)?
The issue author explicitly requested distinguishing time-based UUID (uuid1) and random UUID (uuid4). Both serialize
identically to string in proto, but separate types allow expressing intent in feature definitions and enable future backend-specific optimizations.

Why dedicated proto fields (uuid_val, time_uuid_val)?
Following the pattern established by SET types (PR #5888) and UNIX_TIMESTAMP (which reuses int64/Int64List), we add dedicated oneof fields that reuse existing proto scalar types (string and StringList). This allows WhichOneof("val") to identify UUID types directly from the proto message, without requiring a side-channel.

Backward compatibility for data stored before this change:
OnlineResponse accepts an optional feature_types dict. When data was previously stored as string_val, this metadata enables feast_value_type_to_python_type() to convert it to uuid.UUID. New materializations use uuid_val/time_uuid_val and are identified automatically.

Changes

Layer Files Description
Proto Value.proto, generated *_pb2.py/*_pb2.pyi Add UUID=30, TIME_UUID=31, UUID_LIST=32, TIME_UUID_LIST=33 to ValueType.Enum; add uuid_val, time_uuid_val, uuid_list_val, time_uuid_list_val to Value.oneof
Type system value_type.py, types.py Add UUID, TIME_UUID, UUID_LIST, TIME_UUID_LIST enums and Uuid/TimeUuid aliases
Type conversion type_map.py Add mappings to ~11 conversion dicts (proto, PyArrow, pandas, PostgreSQL, Couchbase, Snowflake); switch from string_val to uuid_val; add PROTO_VALUE_TO_VALUE_TYPE_MAP entries for UUID fields
Online response online_response.py, online_store.py, feature_store.py, utils.py Pass feature_types metadata for backward-compatible deserialization
ODFV on_demand_feature_view.py Add UUID/TIME_UUID sample values for schema inference

Backward Compatibility

  • Data previously stored as string_val still deserializes correctly via the feature_types side-channel
  • New materializations use dedicated uuid_val/time_uuid_val proto fields
  • feast_value_type_to_python_type(v) without feature_type now returns uuid.UUID for uuid_val fields (previously returned plain string for string_val)
  • PostgreSQL uuid columns now infer as ValueType.UUID (previously ValueType.STRING)
  • Go SDK: proto changes compile without errors; UUID handling logic is not implemented (out of scope)

Tests

  • test_types.py: Uuid/TimeUuid ↔ ValueType bidirectional conversion, Array types
  • test_type_map.py: Proto roundtrip with uuid_val, uuid.UUID object return, backward compatibility for string_val, UUID list roundtrip, PostgreSQL mapping
  • All 78 unit tests passing
  • ruff lint and format checks passing

@soooojinlee soooojinlee requested review from a team as code owners February 8, 2026 14:55
@soooojinlee soooojinlee requested review from ejscribner, robhowley and tokoko and removed request for a team February 8, 2026 14:56
devin-ai-integration[bot]

This comment was marked as resolved.

@soooojinlee soooojinlee force-pushed the feat/add-uuid-feature-types branch from 1d4cd01 to 4a5c932 Compare February 8, 2026 15:00
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@nquinn408
Copy link
Contributor

@soooojinlee , thanks so much for putting this together! Can you rebase to bring this PR up to date?

@ntkathole ntkathole force-pushed the feat/add-uuid-feature-types branch from 2c56521 to 1fd106a Compare February 11, 2026 04:44
@ntkathole ntkathole force-pushed the feat/add-uuid-feature-types branch 3 times, most recently from 26198a6 to 525ac72 Compare February 11, 2026 14:14
@soooojinlee soooojinlee force-pushed the feat/add-uuid-feature-types branch 3 times, most recently from cb6dd44 to 54c2eae Compare February 12, 2026 05:55
@ntkathole ntkathole force-pushed the feat/add-uuid-feature-types branch from 2dd648a to b280364 Compare February 16, 2026 06:44
@nquinn408
Copy link
Contributor

@soooojinlee , can you also add support for UUID_SET and TIME_UUID_SET?

@nquinn408
Copy link
Contributor

@soooojinlee , can you also update the docs with the newly supported types?

@soooojinlee soooojinlee force-pushed the feat/add-uuid-feature-types branch 2 times, most recently from 180434b to 41cfe7e Compare February 17, 2026 04:26
@soooojinlee
Copy link
Author

@soooojinlee , can you also add support for UUID_SET and TIME_UUID_SET?

@soooojinlee , can you also update the docs with the newly supported types?

I added UUID_SET / TIME_UUID_SET support and updated the type system documentation as requested in here 41cfe7e

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small nit pick for readability please

@soooojinlee soooojinlee force-pushed the feat/add-uuid-feature-types branch 2 times, most recently from aca5741 to 17aebe6 Compare February 18, 2026 15:22
@soooojinlee
Copy link
Author

I addressed all feedback in 17aebe6

devin-ai-integration[bot]

This comment was marked as resolved.

soooojinlee and others added 12 commits February 19, 2026 09:56
Signed-off-by: soojin <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: soojin <[email protected]>
Signed-off-by: soojin <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: soojin <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: soojin <[email protected]>
Add uuid_val, time_uuid_val, uuid_list_val, time_uuid_list_val as
dedicated oneof fields in the Value proto message, replacing the
previous reuse of string_val/string_list_val. This allows UUID types
to be identified from the proto field alone without requiring a
feature_types side-channel. Backward compatibility is maintained for
data previously stored as string_val.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: soojin <[email protected]>
Add Set(Uuid) and Set(TimeUuid) as feature types with full roundtrip
support, backward compatibility, and documentation for all UUID types.

Signed-off-by: soojin <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ype mappings

Keep PDF_BYTES=30 and IMAGE_BYTES=31 at their upstream values instead of
renumbering them. Shift UUID types to 32-37 in both proto and Python enum.

Also add missing SET type entries in _convert_value_type_str_to_value_type(),
convert_array_column(), and _get_sample_values_by_type() for completeness.

Signed-off-by: soojin <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
The comment claimed Sets do not support UUID/TimeUuid but the code
intentionally allows them. Updated to reflect actual behavior.

Signed-off-by: soojin <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@ntkathole ntkathole force-pushed the feat/add-uuid-feature-types branch from 17aebe6 to cc94cba Compare February 19, 2026 04:26
Keep upstream JSON/STRUCT types at proto enum 32-35, renumber UUID types
to 36-41. Merge JSON/Struct handling with UUID type handling across all
type mapping files.
devin-ai-integration[bot]

This comment was marked as resolved.

…rialization

Return UUID proto fields as plain strings instead of falling through to
feast_value_type_to_python_type which converts them to uuid.UUID objects
that are not JSON-serializable, causing TypeError during HTTP transport.

Signed-off-by: soojin <[email protected]>
…ture-types

# Conflicts:
#	docs/getting-started/concepts/feast-types.md
#	docs/reference/type-system.md
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 38 additional findings in Devin Review.

Open in Devin Review

Comment on lines +160 to +161
if val_attr in ("uuid_set_val", "time_uuid_set_val"):
return {uuid_module.UUID(v) if isinstance(v, str) else v for v in val}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 pd.isnull() raises TypeError on uuid.UUID objects in scalar proto conversion

In _convert_scalar_values_to_proto, the generic scalar conversion path (line 896) calls pd.isnull(value) on each value. When a uuid.UUID object is passed directly (not as a string), pd.isnull(uuid.UUID(...)) returns False in current pandas versions, but this is fragile — in some pandas versions or with numpy interactions, calling pd.isnull() on non-standard types can raise a TypeError. More critically, the valid_scalar_types for ValueType.UUID is {str, uuid_module.UUID}, so UUID objects are accepted, but the assertion at line 876 (type(sample) in valid_scalar_types) will pass. The actual conversion func = lambda x: str(x) works fine. This path appears to function currently but is not a bug per se.

However, there IS a real issue: the _non_empty_value function at sdk/python/feast/type_map.py:1507 checks isinstance(value, Sized)uuid.UUID is not Sized, so it passes the check correctly. No bug here either after verification.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines 152 to 153
elif val_attr.endswith("_set_val") and val_attr != "unix_timestamp_set_val":
val = set(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 UUID set values hit generic set() conversion before UUID-specific handling, producing set of strings instead of set of UUID objects

In feast_value_type_to_python_type at sdk/python/feast/type_map.py:152, the elif branch converts _set_val types to Python sets via val = set(val). For uuid_set_val and time_uuid_set_val, this converts the list of strings into a set of strings. Then at line 160-161, the UUID-specific code does {uuid_module.UUID(v) ... for v in val} which iterates over the set of strings and converts them to UUIDs — so the final result is correct.

However, this is NOT the case for the backward compatibility path at lines 170-173: when a UUID set is stored using the old string_set_val proto field, line 152 converts it to set(val) (a set of strings). Then line 170 checks isinstance(val, set) — this is True, so it enters the branch and converts each string to uuid.UUID. This path is also correct.

After careful trace-through, this is actually not a bug. Withdrawing.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants