Skip to content

Add missing scalar functions #1470

Merged
timsaucer merged 10 commits intoapache:mainfrom
timsaucer:feat/add-missing-scalar-fns
Apr 6, 2026
Merged

Add missing scalar functions #1470
timsaucer merged 10 commits intoapache:mainfrom
timsaucer:feat/add-missing-scalar-fns

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #1453

Rationale for this change

These functions exist upstream but were not exposed to Python.

What changes are included in this PR?

Expose functions to Python
Add unit testss

Are there any user-facing changes?

New addition only.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to close #1453 by exposing several DataFusion scalar functions that exist upstream but were not previously available in the Python API, along with adding Python unit tests for the new bindings.

Changes:

  • Added Python wrappers and exports for arrow_metadata, get_field, union_extract, union_tag, version, plus a Python-level row alias for struct.
  • Added unit tests covering the newly exposed functions (notably union functions and version).
  • Updated codespell skip paths formatting in pyproject.toml.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
python/datafusion/functions.py Adds new Python-level function wrappers/exports (arrow_metadata, get_field, union_*, version, row).
crates/core/src/functions.rs Exposes new functions from the Rust extension module to Python via pyo3 (arrow_metadata, get_field, union_extract, union_tag, version).
python/tests/test_functions.py Adds tests for the newly exposed functions.
pyproject.toml Normalizes codespell skip path entries (removes ./ prefixes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@timsaucer timsaucer force-pushed the feat/add-missing-scalar-fns branch from 192593f to 2771621 Compare April 3, 2026 19:38
timsaucer and others added 4 commits April 3, 2026 15:51
…row_metadata, version, row

Expose upstream DataFusion scalar functions that were not yet available
in the Python API. Closes apache#1453.

- get_field: extracts a field from a struct or map by name
- union_extract: extracts a value from a union type by field name
- union_tag: returns the active field name of a union type
- arrow_metadata: returns Arrow field metadata (all or by key)
- version: returns the DataFusion version string
- row: alias for the struct constructor

Note: arrow_try_cast was listed in the issue but does not exist in
DataFusion 53, so it is not included.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Tests for get_field, arrow_metadata, version, row, union_tag, and
union_extract.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Allow arrow_cast, get_field, and union_extract to accept plain str
arguments instead of requiring Expr wrappers. Also improve
arrow_metadata test coverage and fix parameter shadowing.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@timsaucer timsaucer force-pushed the feat/add-missing-scalar-fns branch from 4384c1f to df1ead1 Compare April 3, 2026 19:52
@timsaucer timsaucer requested a review from Copilot April 3, 2026 20:01
@timsaucer timsaucer marked this pull request as ready for review April 3, 2026 20:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@ntjohnson1 ntjohnson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude didn't do as good a job maintaining existing structure as the last one. Not sure how pedantic we want to be about some of the formatting stuff since there isn't a ruff rule around it. A copilot setting or custom lint rule could help enforce if desired

import numpy as np
import pyarrow as pa
import pytest
from datafusion import SessionContext, column, literal, string_literal
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that this is no longer needed

timsaucer and others added 2 commits April 6, 2026 07:52
Replace Args/Returns sections with doctest Examples blocks for
arrow_metadata, get_field, union_extract, union_tag, and version to
match existing codebase conventions. Simplify row to alias-style
docstring with See Also reference. Document that arrow_cast accepts
both str and Expr for data_type.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
timsaucer and others added 4 commits April 6, 2026 08:11
Allow arrow_cast to accept a pyarrow DataType in addition to str and
Expr. The DataType is converted to its string representation before
being passed to DataFusion. Adds test coverage for the new input type.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Note that expr["field"] is a convenient alternative when the field
name is a static string, and get_field is needed for dynamic
expressions. Add a second doctest example showing the bracket syntax.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Use the existing Rust-side PyArrowType<DataType> conversion via
Expr.cast() instead of str() which produces pyarrow type names
that DataFusion does not recognize.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@timsaucer
Copy link
Copy Markdown
Member Author

Thanks @ntjohnson1 !

@timsaucer timsaucer merged commit d07fdb3 into apache:main Apr 6, 2026
21 checks passed
@timsaucer timsaucer deleted the feat/add-missing-scalar-fns branch April 6, 2026 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing scalar functions (union, arrow metadata, get_field, version, row)

3 participants