fix(postgres): Use end_date in synthetic entity_df for non-entity retrieval#6110
Open
YassinNouh21 wants to merge 3 commits intofeast-dev:masterfrom
Open
fix(postgres): Use end_date in synthetic entity_df for non-entity retrieval#6110YassinNouh21 wants to merge 3 commits intofeast-dev:masterfrom
YassinNouh21 wants to merge 3 commits intofeast-dev:masterfrom
Conversation
…_df for non-entity retrieval The non-entity retrieval path created a synthetic entity_df using pd.date_range(start=start_date, ...)[:1], which placed start_date as the event_timestamp. Since PIT joins use MAX(entity_timestamp) as the upper bound for feature data filtering, using start_date made end_date unreachable — no features after start_date would be returned. Fix: use [end_date] directly, matching the ClickHouse implementation (PR feast-dev#6066) and the Dask offline store behavior. Signed-off-by: yassinnouh21 <[email protected]>
The entity_df fix alone would cause min_event_timestamp to be computed as end_date - TTL (instead of start_date - TTL), clipping valid data from the query window. Override entity_df_event_timestamp_range to (start_date, end_date) in non-entity mode so the full range is used. Also fix ruff formatting in the test file. Signed-off-by: yassinnouh21 <[email protected]>
0f4df15 to
e82371d
Compare
Member
franciscojavierarceo
left a comment
There was a problem hiding this comment.
can you add an integration test for this so we can confirm the behavior?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
entity_df=None) created a synthetic entity_df usingpd.date_range(start=start_date, ...)[:1], which placedstart_dateas the event timestampMAX(entity_timestamp)as the upper bound, usingstart_datemadeend_dateunreachable — no features afterstart_datewould be returned[end_date]directly, matching the ClickHouse (feat: Add non-entity retrieval support for ClickHouse offline store #6066) and Dask implementationsTest plan
test_non_entity_entity_df_uses_end_datethat captures the synthetic entity_df and asserts its timestamp equalsend_dateTestNonEntityRetrievaltests passFixes the bug identified during review of #6066, referenced in #6057.