Skip to content

Enhance Expr.cast() to accept python types #753

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
To make a more user friendly interface, Expr.cast() should accept a python type and attempt to convert it to the appropriate pyarrow data type. This is predicated upon pull request #750 being merged.

Describe the solution you'd like
See the below example from @datapythonista

Describe alternatives you've considered
Alternative is to leave as is, which is operable.

Additional context
This example of enhancement requests include the desired use case:

import datafusion
from datafusion import col, lit, functions as f
import pyarrow


# something like this would be implemented internally, so users can call `datafusion.read_*`
def _read_parquet(*args, **kwargs):
    ctx = datafusion.SessionContext()
    return ctx.read_parquet(*args, **kwargs)
datafusion.read_parquet = _read_parquet  # creating an alias of `read_*` functions so users don't need to know about `SessionContext` when the defaults are fine


df = (datafusion.read_parquet("buildings.parquet")
                .filter(  # `.filter()` accepting multiple conditions (which will be an AND) instead of having to use `&` with its operator precedence problems
                    col("is_offplan") == False,
                    col("rooms") >= 2,  # `.lit(2)` not being required, and Python literals working with operators
                )
                .aggregate(
                    [col("area_name_en")],
                    [f.mean(col("has_parking").cast(float))],  # `.cast()` accepting Python types, which would be internally converted to the PyArrow equivalent
                )
                .select(
                    col("area_name_en").alias("Area"),
                    col("AVG(has_parking)").alias("Percentage of buildings with parking"),  # removing the default `?table?` in column names, the column name was "AVG(?table?.has_parking)"
                )
     )

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions