Skip to content

Add DataFrame transform function #807

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Add a function akin to DataFrame.transform from pyspark. This gives an easy to use way to chain DataFrame transformations.

Describe the solution you'd like

It is common to write a python function that takes as it's input a DataFrame plus 0 or more arguments and return a DataFrame. It is convenient to be able to write functions this way and to chain them. For example

def add_something_cool(df: DataFrame) -> DataFrame:
    return df.with_column("the_answer", lit(42))

def add_another(df: DataFrame, col_name: str) -> DataFrame:
    return df.with_column(col_name, lit("another"))

df_original.transform(add_something_cool).transform(add_another, "second_col").show()

Describe alternatives you've considered

To do the above operation I would probably do it like

df = add_something_cool(df_original)
df = add_another(df, "second_col")
df.show()

Additional context

Documentation via Databricks for pyspark

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions