Skip to content

Conversation

@billonahill
Copy link

Parquet schemas are case sensitive but hive is not so camelCase parquet fields are not found when accessed as lowercase hive columns.

@billonahill
Copy link
Author

No, nested fields with caps are still broken (IQ-129).

@dain dain self-assigned this Mar 1, 2016
@dain
Copy link
Contributor

dain commented Mar 2, 2016

@zhenxiao, does this look good to you? Also, any ideas on how we can test for this?

@zhenxiao
Copy link
Collaborator

zhenxiao commented Mar 2, 2016

@dain @billonahill the fix looks good to me.
Kind of difficult to reproduce it in our current test case. Let me think of a way.

@billonahill
Copy link
Author

@dain @zhenxiao yeah, I also looked at the tests to see how to verify this in a test case but I came up blank. Let me know if you have any ideas.

Also, since submitting this patch I released that we need to further patch the twitter fork to handle hive reserved words. When a parquet field is a reserved Hive word (i.e., keyword, location), we append an '_' to the field name before adding to Hive. This works fine when looking up presto columns by position, but not by name. As a result we also need to check for a match after stripping the _ (see https://github.com/twitter-forks/presto/pull/32/files). This isn't something we want in the OSS master but I am curious how others deal with reserved Hive keywords.

@nezihyigitbasi
Copy link
Contributor

@dain @billonahill @zhenxiao I created a PR for a simple unit test.

@zhenxiao
Copy link
Collaborator

zhenxiao commented Mar 3, 2016

@dain @billonahill
@nezihyigitbasi 's PR looks good to me. What do you think?

@billonahill
Copy link
Author

@dain @zhenxiao unit tests look good to me. Thanks @nezihyigitbasi!

I just refactored this patch to be easier to further patch on a private branch, like we're doing in https://github.com/twitter-forks/presto/pull/32/files.

@dain dain added the accepted label Mar 3, 2016
@dain
Copy link
Contributor

dain commented Mar 5, 2016

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants