Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Dec 22, 2021

Which issue does this PR close?

Closes #1476

Rationale for this change

This is a regression introduced in #1455

The following code in Sort skips the field level metadata when when it constructs the output batch.

    let schema = Arc::new(Schema::new(
        schema
            .fields()
            .iter()
            .zip(batch.columns().iter().map(|col| col.data_type()))
            .map(|(field, ty)| Field::new(field.name(), ty.clone(), field.is_nullable()))
            .collect::<Vec<_>>(),
    ));

Thus the output schema is not correct.

It also seems to be unnecessary (at least all the tests pass without it, so maybe @maxburke can comment on what problem it was solving).

@hntd187 has also added some functions working with apache/arrow-rs#1033 PR to add some additional functions for creating fields and I will work on this as well

What changes are included in this PR?

  1. Remove offending code
  2. add a test

Are there any user-facing changes?

Bug fix

@alamb alamb added the bug Something isn't working label Dec 22, 2021
None,
)?;

let schema = Arc::new(Schema::new(
Copy link
Contributor Author

@alamb alamb Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fix is to delete this code, the rest of the PR is a test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clean 🧹🧼

@hntd187
Copy link
Contributor

hntd187 commented Dec 22, 2021

This is just to correct the bug correct? We will deal with carrying over metadata properly in another PR?

@alamb
Copy link
Contributor Author

alamb commented Dec 22, 2021

This is just to correct the bug correct? We will deal with carrying over metadata properly in another PR?

Yes -- I think DataFusion does carry over metadata now in many cases and this PR corrects a case where the metadata used to be carried over but is no longer

@Dandandan Dandandan merged commit 4012713 into apache:master Dec 23, 2021
@maxburke
Copy link
Contributor

maxburke commented Jan 3, 2022

@alamb If I recall the reason this change was made was because in our cases the schema was stripped of the timezone information but the underlying record batches preserved it.

@alamb alamb deleted the alamb/sort_exec_metadata branch January 4, 2022 11:46
@alamb
Copy link
Contributor Author

alamb commented Jan 4, 2022

@alamb If I recall the reason this change was made was because in our cases the schema was stripped of the timezone information but the underlying record batches preserved it.

I am sorry if I broke something for you @maxburke -- if you help me understand / write a reproducer for what want to do I am happy to help try and make it work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sort discards field metadata on the output schema

5 participants