Skip to content

ENH: check_dtype='equiv' for assert_frame_equal in unit testing #59182

@levaphenyl

Description

@levaphenyl

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

My test fails because the data cast and expected data have slightly different types, like int32 vs. int64.
I don't want to use assert_frame_equal(df1, df2, check_dtype=False) because it does not check the data type at all, which is bad.

import pandas as pd


a = pd.DataFrame({'Int': [1, 2, 3], 'Float': [0.57, 0.179, 0.213]})  # Automatic type casting
# Force 32-bit
b = a.copy()
b['Int'] = b['Int'].astype('int32')
b['Float'] = b['Float'].astype('float32')
# Force 64-bit
c = a.copy()
c['Int'] = c['Int'].astype('int64')
c['Float'] = c['Float'].astype('float64')
try:
    pd.testing.assert_frame_equal(b, c)
    print('Success')
except AssertionError as err:
    print(err)

gives

Attributes of DataFrame.iloc[:, 0] (column name="Int") are different

Attribute "dtype" are different
[left]:  int32
[right]: int64

Feature Description

Something like assert_frame_equal(df1, df2, check_dtype='equiv') would be handy but it does not work because the function uses the hard check of assert_attr_equal under the hood.

It means changing the logic to either have a soft attribute check in assert_attr_equal, or call a new function if the check_dtype is set to 'equiv'.

Alternative Solutions

I added a workaround function to my unit tests, which casts the data type of one DataFrame to the other when the types are similar (int, float).

def assert_frame_equiv(left: pd.DataFrame, right: pd.DataFrame) -> None:
    """Convert equivalent data types to same before comparing.

    Parameters
    ----------
    left : DataFrame
        First DataFrame to compare.
    right : DataFrame
        Second DataFrame to compare.

    Raises
    ------
    AssertionError
        If the DataFrames are different.
    """
    # First, check that the columns are the same.
    pd.testing.assert_index_equal(left.columns, right.columns, check_order=False)
    # Knowing columns names are the same, cast the same data type if equivalent.
    for col_name in left.columns:
        lcol = left[col_name]
        rcol = right[col_name]
        if (
            (pd.api.types.is_integer_dtype(lcol) and pd.api.types.is_integer_dtype(rcol))
            or (pd.api.types.is_float_dtype(lcol) and pd.api.types.is_float_dtype(rcol))
        ):
            left[col_name] = lcol.astype(rcol.dtype)

    return pd.testing.assert_frame_equal(left, right, check_like=True)

Additional Context

Adapted from my answer on SO.

Thanks for making pandas!

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team memberTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions