-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-1293 [SQL] Parquet support for nested types #360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
aa688fe
Adding conversion of nested Parquet schemas
AndreSchumacher 4d4892a
First commit nested Parquet read converters
AndreSchumacher 6125c75
First working nested Parquet record input
AndreSchumacher 745a42b
Completing testcase for nested data (Addressbook(
AndreSchumacher ddb40d2
Extending tests for nested Parquet data
AndreSchumacher 1b1b3d6
Fixing one problem with nested arrays
AndreSchumacher 5d80461
fixing one problem with nested structs and breaking up files
AndreSchumacher 98219cf
added struct converter
AndreSchumacher ee70125
fixing one problem with arrayconverter
AndreSchumacher b7fcc35
Documenting conversions, bugfix, wrappers of Rows
AndreSchumacher 6dbc9b7
Fixing some problems intruduced during rebase
AndreSchumacher f8f8911
For primitive rows fall back to more efficient converter, code reorg
AndreSchumacher 4e25fcb
Adding resolution of complex ArrayTypes
AndreSchumacher a594aed
Scalastyle
AndreSchumacher b539fde
First commit for MapType
AndreSchumacher 824500c
Adding attribute resolution for MapType
AndreSchumacher f777b4b
Scalastyle
AndreSchumacher d1911dc
Simplifying ArrayType conversion
AndreSchumacher 1dc5ac9
First version of WriteSupport for nested types
AndreSchumacher e99cc51
Fixing nested WriteSupport and adding tests
AndreSchumacher adc1258
Optimizing imports
AndreSchumacher f466ff0
Added ParquetAvro tests and revised Array conversion
AndreSchumacher 79d81d5
Replacing field names for array and map in WriteSupport
AndreSchumacher 619c397
Completing Map testcase
AndreSchumacher c52ff2c
Adding native-array converter
AndreSchumacher 431f00f
Fixing problems introduced during rebase
AndreSchumacher a6b4f05
Cleaning up ArrayConverter, moving classTag to NativeType, adding Nat…
AndreSchumacher 0ae9376
Doc strings and simplifying ParquetConverter.scala
AndreSchumacher 32229c7
Removing Row nested values and placing by generic types
AndreSchumacher cbb5793
Code review feedback
AndreSchumacher 191bc0d
Changing to Seq for ArrayType, refactoring SQLParser for nested field…
AndreSchumacher 2f5a805
Removing stripMargin from test schemas
AndreSchumacher de02538
Cleaning up ParquetTestData
AndreSchumacher 31465d6
Scalastyle: fixing commented out bottom
AndreSchumacher 3c6b25f
Trying to reduce no-op changes wrt master
AndreSchumacher 3104886
Nested Rows should be Rows, not Seqs.
marmbrus f7aeba3
[SPARK-1982] Support for ByteType and ShortType.
marmbrus 3e1456c
WIP: Directly serialize catalyst attributes.
marmbrus 14c3fd8
Attempting to fix Spark-Parquet schema conversion
AndreSchumacher 37e0a0a
Cleaning up
AndreSchumacher 88e6bdb
Attempting to fix loss of schema
AndreSchumacher 63d1b57
Cleaning up and Scalastyle
AndreSchumacher b8a8b9a
More fixes to short and byte conversion
AndreSchumacher 403061f
Fixing some issues with tests and schema metadata
AndreSchumacher 94eea3a
Scalastyle
AndreSchumacher 7eceb67
Review feedback
AndreSchumacher 95c1367
Changes to ParquetRelation and its metadata
AndreSchumacher 30708c8
Taking out AvroParquet test for now to remove Avro dependency
AndreSchumacher File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Cleaning up ArrayConverter, moving classTag to NativeType, adding Nat…
…iveRow
- Loading branch information
commit a6b4f050c02e18409e052ae9c9e2489deac09b0d
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this class? Arrays don't need to be
Rows inside of the execution engine, they only need to be of typeSeq, and even that requirements should probably be removed. Instead of NativeRow can we just calltoSeqon the Array?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marmbrus Good question. I think I added that because GetField wants to get a Row when it calls
evalon its children. I will have another look at that.