Skip to content

add support for reading csv with variable number of columns #891

@djouallah

Description

@djouallah

I am doing an ETL benchmarks that read csv files with variable number of columns, do some transformation and write it back as delta, I test it with 7 Python Engines, unfortunately datafusion support only a csv with a fixed schema.

fwiw the notebook is here with a reproducible data source : https://github.com/djouallah/Fabric_Notebooks_Demo/blob/main/ETL/Light_ETL_Python_Notebook.ipynb

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions