Skip to content

Conversation

@praveentandra
Copy link
Contributor

@praveentandra praveentandra commented Aug 17, 2025

When use_high_precision is false, the type for NUMBER columns with non-zero scale are incorrectly returned as Int64 instead of Float64, causing data discrepancy.

This seems to be a corner case, affecting schema only operations at the client while data path seems to be good.

…lse :

When use_high_precision is false, NUMBER columns with non-zero scale are incorrectly returned as Int64 instead of Float64, causing data descrepency.

This fix checks the scale value to determine the appropriate Arrow type (Int64 vs Float64) to set the behaviour per documentation at https://arrow.apache.org/adbc/main/driver/snowflake.html
@github-actions github-actions bot added this to the ADBC Libraries 20 milestone Aug 17, 2025
@praveentandra praveentandra changed the title Fix for returning ARROW numeric type correctly when use_high_precision is false fix (go/adbc/driver/snowflake): returning ARROW numeric type correctly when use_high_precision is false Aug 17, 2025
@praveentandra praveentandra changed the title fix (go/adbc/driver/snowflake): returning ARROW numeric type correctly when use_high_precision is false fix (go/adbc/driver/snowflake): return arrow numeric type correctly when use_high_precision is false Aug 17, 2025
@praveentandra praveentandra changed the title fix (go/adbc/driver/snowflake): return arrow numeric type correctly when use_high_precision is false fix(go/adbc/driver/snowflake): return arrow numeric type correctly when use_high_precision is false Aug 17, 2025
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to fall back to decimal?

@zeroshade
Copy link
Member

Can we add a unit test for this?

praveentandra added a commit to praveentandra/arrow-adbc that referenced this pull request Aug 25, 2025
… when use_high_precision=false

  Previously, when use_high_precision=false, NUMBER columns with scale>0 were returned as scaled Int64 from Snowflake.
  This mismatch caused data corruption at the clients with the decimal data showing up as Int64.

  The fix changes the behavior to use Decimal128 for all non-integer NUMBER types
  (scale>0) even when use_high_precision=false, ensuring:
  - Type consistency between schema and data
  - Exact precision preservation (no floating-point approximation)
  - The Int64 optimization for NUMBER(p,0) is preserved

  Changes:
  - record_reader.go: Use Decimal128 for NUMBER(p,s>0) when use_high_precision=false
  - connection.go: Update schema inference to match record_reader logic
  - driver_test.go: Add comprehensive tests for NUMBER type handling
  - snowflake.rst: Update documentation to reflect new behavior

  This is a different issue from apache#1242 (fixed in PR apache#1267), which addressed the
  Int64→Decimal128 conversion for use_high_precision=true. This fix addresses the
  type mismatch in the use_high_precision=false code path.

  Breaking change: Applications expecting Float64 for NUMBER(p,s>0) with
  use_high_precision=false will now receive Decimal128. While this is a breaking change, the previous functionality was returning incorrect values (as scaled Int64) to the client. The documentation is changed accordingly. I don't think returning decimal data as float is right since float/double are in seperate category. This is per obervation by @lidavidm at apache#3295
@CurtHagenlocher
Copy link
Contributor

Wouldn't it make more sense to fall back to decimal?

As I understand it, the premise behind the original implementation was that not all consumers are able to meaningfully use a decimal128 value. So the driver was using the "best possible non-decimal128" type to store the value -- with possible loss of precision but no loss of scale. If we assume that all consumers can work with decimal128 then I think the flag is effectively obsolete.

@zeroshade
Copy link
Member

As I understand it, the premise behind the original implementation was that not all consumers are able to meaningfully use a decimal128 value. So the driver was using the "best possible non-decimal128" type to store the value -- with possible loss of precision but no loss of scale.

Precisely

@CurtHagenlocher
Copy link
Contributor

As I understand it, the premise behind the original implementation was that not all consumers are able to meaningfully use a decimal128 value. So the driver was using the "best possible non-decimal128" type to store the value -- with possible loss of precision but no loss of scale.

Precisely

Right, so given this I think that the original change which simply picked float64 instead of int64 when the scale was nonzero is a better choice.

@praveentandra
Copy link
Contributor Author

As I understand it, the premise behind the original implementation was that not all consumers are able to meaningfully use a decimal128 value. So the driver was using the "best possible non-decimal128" type to store the value -- with possible loss of precision but no loss of scale.

Precisely

Right, so given this I think that the original change which simply picked float64 instead of int64 when the scale was nonzero is a better choice.

That's what I thought about 128 vs 64 and it makes sense, but it has a bug as described below. As you may know, snowflake doesn't have a good way to reference integers vs decimals as it doesn't retain aliases post table creation. I was looking for this flag as a way to come around that problem. Unable to use the falg=false setting because of the bug. Below is how the data shows up in duckdb after querying from Snowflake via adbc.

D select c_custkey, c_name, c_acctbal from sf_db.tpch_sf1.customer order by c_custkey limit 5;

use_high_precision = true - inefficient type at client for integers (c_custkey)
┌───────────────┬────────────────────┬───────────────┐
│ C_CUSTKEY │ C_NAME │ C_ACCTBAL │
│ decimal(38,0) │ varchar │ decimal(12,2) │
├───────────────┼────────────────────┼───────────────┤
│ 1 │ Customer#000000001 │ 711.56 │
│ 2 │ Customer#000000002 │ 121.65 │
│ 3 │ Customer#000000003 │ 7498.12 │
│ 4 │ Customer#000000004 │ 2866.83 │
│ 5 │ Customer#000000005 │ 794.47 │
└───────────────┴────────────────────┴───────────────┘
use_high_precision = false - incorrect type and value at client for decimals (c_acctbal)
┌───────────┬────────────────────┬─────────────────────┐
│ C_CUSTKEY │ C_NAME │ C_ACCTBAL │
│ int64 │ varchar │ int64 │
├───────────┼────────────────────┼─────────────────────┤
│ 1 │ Customer#000000001 │ 4649470163769863700 │
│ 2 │ Customer#000000002 │ 4638260774666082714 │
│ 3 │ Customer#000000003 │ 4664966284827552645 │
│ 4 │ Customer#000000004 │ 4658522640913436508 │
│ 5 │ Customer#000000005 │ 4650199447842334966 │
└───────────┴────────────────────┴─────────────────────┘

Given this behavior, I am not sure if any of the clients are using use_high_precision = false.

Please advise on one of the below:

  • Fix the issue for float64
  • Fix the issue and retain decimal128
  • Leave it as is, lets solve decimal(p, s=0)->int conversion separately

@zeroshade
Copy link
Member

In my opinion this should be the behavior:

  1. If useHighPrecision == true then just use the decimal type as-is that was given to us
  2. If useHighPrecision == false then: for scale == 0 use int64, otherwise use float64.

@praveentandra praveentandra force-pushed the fix-snowflake-number-scale branch from 1007db7 to 7687b15 Compare August 27, 2025 21:43
@praveentandra
Copy link
Contributor Author

In my opinion this should be the behavior:

  1. If useHighPrecision == true then just use the decimal type as-is that was given to us
  2. If useHighPrecision == false then: for scale == 0 use int64, otherwise use float64.

I had updated the PR accordingly

@lidavidm
Copy link
Member

lidavidm commented Sep 1, 2025

@zeroshade any comments?

@lidavidm lidavidm merged commit 206e02f into apache:main Oct 7, 2025
41 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants