-
Notifications
You must be signed in to change notification settings - Fork 168
fix(go/adbc/driver/snowflake): return arrow numeric type correctly when use_high_precision is false #3295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(go/adbc/driver/snowflake): return arrow numeric type correctly when use_high_precision is false #3295
Conversation
…lse : When use_high_precision is false, NUMBER columns with non-zero scale are incorrectly returned as Int64 instead of Float64, causing data descrepency. This fix checks the scale value to determine the appropriate Arrow type (Int64 vs Float64) to set the behaviour per documentation at https://arrow.apache.org/adbc/main/driver/snowflake.html
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to fall back to decimal?
|
Can we add a unit test for this? |
… when use_high_precision=false Previously, when use_high_precision=false, NUMBER columns with scale>0 were returned as scaled Int64 from Snowflake. This mismatch caused data corruption at the clients with the decimal data showing up as Int64. The fix changes the behavior to use Decimal128 for all non-integer NUMBER types (scale>0) even when use_high_precision=false, ensuring: - Type consistency between schema and data - Exact precision preservation (no floating-point approximation) - The Int64 optimization for NUMBER(p,0) is preserved Changes: - record_reader.go: Use Decimal128 for NUMBER(p,s>0) when use_high_precision=false - connection.go: Update schema inference to match record_reader logic - driver_test.go: Add comprehensive tests for NUMBER type handling - snowflake.rst: Update documentation to reflect new behavior This is a different issue from apache#1242 (fixed in PR apache#1267), which addressed the Int64→Decimal128 conversion for use_high_precision=true. This fix addresses the type mismatch in the use_high_precision=false code path. Breaking change: Applications expecting Float64 for NUMBER(p,s>0) with use_high_precision=false will now receive Decimal128. While this is a breaking change, the previous functionality was returning incorrect values (as scaled Int64) to the client. The documentation is changed accordingly. I don't think returning decimal data as float is right since float/double are in seperate category. This is per obervation by @lidavidm at apache#3295
As I understand it, the premise behind the original implementation was that not all consumers are able to meaningfully use a decimal128 value. So the driver was using the "best possible non-decimal128" type to store the value -- with possible loss of precision but no loss of scale. If we assume that all consumers can work with decimal128 then I think the flag is effectively obsolete. |
Precisely |
Right, so given this I think that the original change which simply picked |
That's what I thought about 128 vs 64 and it makes sense, but it has a bug as described below. As you may know, snowflake doesn't have a good way to reference integers vs decimals as it doesn't retain aliases post table creation. I was looking for this flag as a way to come around that problem. Unable to use the falg=false setting because of the bug. Below is how the data shows up in duckdb after querying from Snowflake via adbc. D select c_custkey, c_name, c_acctbal from sf_db.tpch_sf1.customer order by c_custkey limit 5; use_high_precision = true - inefficient type at client for integers (c_custkey) Given this behavior, I am not sure if any of the clients are using use_high_precision = false. Please advise on one of the below:
|
|
In my opinion this should be the behavior:
|
1007db7 to
7687b15
Compare
I had updated the PR accordingly |
|
@zeroshade any comments? |
When use_high_precision is false, the type for NUMBER columns with non-zero scale are incorrectly returned as Int64 instead of Float64, causing data discrepancy.
This seems to be a corner case, affecting schema only operations at the client while data path seems to be good.