When using ClickHouse as primary connection i got an issue where UUID data type seems not supported
Environment
- Version: 0.180.0
- Self-hosted web UI
Steps to reproduce
Im using Web UI and querying it like this:
and got an error:
("Could not convert UUID('00030514-4002-4723-904d-fd9d16c9cdaa') with type UUID: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column generateUUIDv4() with type object')
but if i query it like this:
select toString(generateUUIDv4())
-- select generateUUIDv4()::String
-- select toUInt128(generateUUIDv4())
-- select generateUUIDv4()::UInt128
then query is successfully executed and i see the results
Where it comes from
Looks like pyarrow doesnt support conversion of UUID data types natively from pandas DataFrame to pyarrow Table
Linked issues (external library)
apache/arrow#44224
apache/arrow#43855
Code
This comes from this method:
https://github.com/TobikoData/sqlmesh/blob/e773b59a626cc7607930f014f045c78f60add72f/web/server/utils.py#L88-L97
And may be fixed like this:
def df_to_pyarrow_bytes(df: pd.DataFrame) -> io.BytesIO:
"""Convert a DataFrame to pyarrow bytes stream"""
# get all columns that are `uuid`
uuid_cols = [col for col in df.columns if any(isinstance(v, uuid.UUID) for v in df[col])]
# convert all `uuid` to `uuid.bytes`
for col in uuid_cols:
df[col] = df[col].apply(lambda x: x.bytes)
# infer schema from pandas dataframe
schema = pa.Schema.from_pandas(df, preserve_index=False)
# remove each `uuid` column from schema and then add it back as `pa.uuid`
for col in uuid_cols:
idx = schema.get_field_index(col)
schema = schema.remove(idx).append(pa.field(col, pa.uuid()))
# another method - merging 2 schemas
# https://arrow.apache.org/cookbook/py/schema.html#merging-multiple-schemas
# create table from pandas dataframe with new schema
table = pa.Table.from_pandas(df, schema=schema)
...
Full error log (Traceback)
Traceback (most recent call last):
File "/home/sava/dev/sqlmesh/unitrade/.venv/lib/python3.13/site-packages/web/server/settings.py", line 102, in get_loaded_context
yield _get_loaded_context(settings.project_path, settings.config, settings.gateway)
File "/home/sava/dev/sqlmesh/unitrade/.venv/lib/python3.13/site-packages/fastapi/concurrency.py", line 27, in contextmanager_in_threadpool
yield await run_in_threadpool(cm.__enter__)
File "/home/sava/dev/sqlmesh/unitrade/.venv/lib/python3.13/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<3 lines>...
)
^
File "/home/sava/dev/sqlmesh/unitrade/.venv/lib/python3.13/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sava/dev/sqlmesh/unitrade/.venv/lib/python3.13/site-packages/web/server/api/endpoints/commands.py", line 100, in fetchdf
return ArrowStreamingResponse(df_to_pyarrow_bytes(df))
~~~~~~~~~~~~~~~~~~~^^^^
File "/home/sava/dev/sqlmesh/unitrade/.venv/lib/python3.13/site-packages/web/server/utils.py", line 90, in df_to_pyarrow_bytes
table = pa.Table.from_pandas(df)
File "pyarrow/table.pxi", line 4793, in pyarrow.lib.Table.from_pandas
File "/home/sava/dev/sqlmesh/unitrade/.venv/lib/python3.13/site-packages/pyarrow/pandas_compat.py", line 639, in dataframe_to_arrays
arrays = [convert_column(c, f)
~~~~~~~~~~~~~~^^^^^^
File "/home/sava/dev/sqlmesh/unitrade/.venv/lib/python3.13/site-packages/pyarrow/pandas_compat.py", line 626, in convert_column
raise e
File "/home/sava/dev/sqlmesh/unitrade/.venv/lib/python3.13/site-packages/pyarrow/pandas_compat.py", line 620, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 365, in pyarrow.lib.array
File "pyarrow/array.pxi", line 90, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ("Could not convert UUID('0643d144-3ad2-4ce5-8e00-b0f7e754be1e') with type UUID: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column generateUUIDv4() with type object')
When using ClickHouse as primary connection i got an issue where UUID data type seems not supported
Environment
Steps to reproduce
Im using Web UI and querying it like this:
select generateUUIDv4()and got an error:
but if i query it like this:
then query is successfully executed and i see the results
Where it comes from
Looks like
pyarrowdoesnt support conversion ofUUIDdata types natively from pandasDataFrameto pyarrowTableLinked issues (external library)
apache/arrow#44224
apache/arrow#43855
Code
This comes from this method:
https://github.com/TobikoData/sqlmesh/blob/e773b59a626cc7607930f014f045c78f60add72f/web/server/utils.py#L88-L97
And may be fixed like this:
Full error log (Traceback)