-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
My test fails because the data cast and expected data have slightly different types, like int32
vs. int64
.
I don't want to use assert_frame_equal(df1, df2, check_dtype=False)
because it does not check the data type at all, which is bad.
import pandas as pd
a = pd.DataFrame({'Int': [1, 2, 3], 'Float': [0.57, 0.179, 0.213]}) # Automatic type casting
# Force 32-bit
b = a.copy()
b['Int'] = b['Int'].astype('int32')
b['Float'] = b['Float'].astype('float32')
# Force 64-bit
c = a.copy()
c['Int'] = c['Int'].astype('int64')
c['Float'] = c['Float'].astype('float64')
try:
pd.testing.assert_frame_equal(b, c)
print('Success')
except AssertionError as err:
print(err)
gives
Attributes of DataFrame.iloc[:, 0] (column name="Int") are different
Attribute "dtype" are different
[left]: int32
[right]: int64
Feature Description
Something like assert_frame_equal(df1, df2, check_dtype='equiv')
would be handy but it does not work because the function uses the hard check of assert_attr_equal
under the hood.
It means changing the logic to either have a soft attribute check in assert_attr_equal
, or call a new function if the check_dtype
is set to 'equiv'
.
Alternative Solutions
I added a workaround function to my unit tests, which casts the data type of one DataFrame to the other when the types are similar (int, float).
def assert_frame_equiv(left: pd.DataFrame, right: pd.DataFrame) -> None:
"""Convert equivalent data types to same before comparing.
Parameters
----------
left : DataFrame
First DataFrame to compare.
right : DataFrame
Second DataFrame to compare.
Raises
------
AssertionError
If the DataFrames are different.
"""
# First, check that the columns are the same.
pd.testing.assert_index_equal(left.columns, right.columns, check_order=False)
# Knowing columns names are the same, cast the same data type if equivalent.
for col_name in left.columns:
lcol = left[col_name]
rcol = right[col_name]
if (
(pd.api.types.is_integer_dtype(lcol) and pd.api.types.is_integer_dtype(rcol))
or (pd.api.types.is_float_dtype(lcol) and pd.api.types.is_float_dtype(rcol))
):
left[col_name] = lcol.astype(rcol.dtype)
return pd.testing.assert_frame_equal(left, right, check_like=True)
Additional Context
Adapted from my answer on SO.
Thanks for making pandas
!