Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
297 commits
Select commit Hold shift + click to select a range
39ccaba
[SPARK-3861][SQL] Avoid rebuilding hash tables for broadcast joins on…
rxin Oct 13, 2014
49bbdcb
[Spark] RDD take() method: overestimate too much
yingjieMiao Oct 13, 2014
46db277
[SPARK-3892][SQL] remove redundant type name
adrian-wang Oct 13, 2014
2ac40da
[SPARK-3407][SQL]Add Date type support
adrian-wang Oct 13, 2014
56102dc
[SPARK-2066][SQL] Adds checks for non-aggregate attributes with aggre…
liancheng Oct 13, 2014
d3cdf91
[SPARK-3529] [SQL] Delete the temp files after test exit
chenghao-intel Oct 13, 2014
73da9c2
[SPARK-3771][SQL] AppendingParquetOutputFormat should use reflection …
ueshin Oct 13, 2014
e10d71e
[SPARK-3559][SQL] Remove unnecessary columns from List of needed Colu…
gvramana Oct 13, 2014
371321c
[SQL] Add type checking debugging functions
marmbrus Oct 13, 2014
e6e3770
SPARK-3807: SparkSql does not work for tables created using custom serde
chiragaggarwal Oct 13, 2014
9d9ca91
[SQL]Small bug in unresolved.scala
Ishiihara Oct 13, 2014
9eb49d4
[SPARK-3809][SQL] Fixes test suites in hive-thriftserver
liancheng Oct 13, 2014
4d26aca
[SPARK-3912][Streaming] Fixed flakyFlumeStreamSuite
tdas Oct 14, 2014
186b497
[SPARK-3921] Fix CoarseGrainedExecutorBackend's arguments for Standal…
aarondav Oct 14, 2014
9b6de6f
SPARK-3178 setting SPARK_WORKER_MEMORY to a value without a label (m…
bbejeck Oct 14, 2014
7ced88b
[SPARK-3946] gitignore in /python includes wrong directory
tsudukim Oct 14, 2014
24b818b
[SPARK-3944][Core] Using Option[String] where value of String can be …
Shiti Oct 14, 2014
56096db
SPARK-3803 [MLLIB] ArrayIndexOutOfBoundsException found in executing …
srowen Oct 14, 2014
7b4f39f
[SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set
cocoatomo Oct 14, 2014
66af8e2
[SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in…
tsudukim Oct 15, 2014
18ab6bd
SPARK-1307 [DOCS] Don't use term 'standalone' to refer to a Spark App…
srowen Oct 15, 2014
293a0b5
[SPARK-2098] All Spark processes should support spark-defaults.conf, …
witgo Oct 15, 2014
044583a
[Core] Upgrading ScalaStyle version to 0.5 and removing SparkSpaceAft…
prudhvi953 Oct 16, 2014
4c589ca
[SPARK-3944][Core] Code re-factored as suggested
Shiti Oct 16, 2014
091d32c
[SPARK-3971] [MLLib] [PySpark] hotfix: Customized pickler should work…
davies Oct 16, 2014
99e416b
[SQL] Fixes the race condition that may cause test failure
liancheng Oct 16, 2014
2fe0ba9
SPARK-3874: Provide stable TaskContext API
ScrapCodes Oct 17, 2014
7f7b50e
[SPARK-3923] Increase Akka heartbeat pause above heartbeat interval
aarondav Oct 17, 2014
be2ec4a
[SQL]typo in HiveFromSpark
Oct 17, 2014
642b246
[SPARK-3941][CORE] _remainingmem should not increase twice when updat…
liyezhang556520 Oct 17, 2014
e7f4ea8
[SPARK-3890][Docs]remove redundant spark.executor.memory in doc
WangTaoTheTonic Oct 17, 2014
56fd34a
[SPARK-3741] Add afterExecute for handleConnectExecutor
zsxwing Oct 17, 2014
dedace8
[SPARK-3067] JobProgressPage could not show Fair Scheduler Pools sect…
YanTangZhai Oct 17, 2014
e678b9f
[SPARK-3973] Print call site information for broadcasts
shivaram Oct 17, 2014
c351862
[SPARK-3935][Core] log the number of records that has been written
Oct 17, 2014
803e7f0
[SPARK-3979] [yarn] Use fs's default replication.
Oct 17, 2014
adcb7d3
[SPARK-3855][SQL] Preserve the result attribute of python UDFs though…
marmbrus Oct 17, 2014
23f6171
[SPARK-3985] [Examples] fix file path using os.path.join
adrian-wang Oct 17, 2014
477c648
[SPARK-3934] [SPARK-3918] [mllib] Bug fixes for RandomForest, Decisi…
jkbradley Oct 17, 2014
f406a83
SPARK-3926 [CORE] Result of JavaRDD.collectAsMap() is not Serializable
srowen Oct 18, 2014
05db2da
[SPARK-3952] [Streaming] [PySpark] add Python examples in Streaming P…
davies Oct 19, 2014
7e63bb4
[SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport)
JoshRosen Oct 19, 2014
d1966f3
[SPARK-3902] [SPARK-3590] Stabilize AsynRDDActions and add Java API
JoshRosen Oct 20, 2014
c7aeecd
[SPARK-3948][Shuffle]Fix stream corruption bug in sort-based shuffle
jerryshao Oct 20, 2014
51afde9
[SPARK-4010][Web UI]Spark UI returns 500 in yarn-client mode
witgo Oct 20, 2014
ea054e1
[SPARK-3986][SQL] Fix package names to fit their directory names.
ueshin Oct 20, 2014
4afe9a4
[SPARK-3736] Workers reconnect when disassociated from the master.
mccheah Oct 20, 2014
eadc4c5
[SPARK-3207][MLLIB]Choose splits for continuous features in DecisionT…
chouqin Oct 20, 2014
1b3ce61
[SPARK-3906][SQL] Adds multiple join support for SQLContext
liancheng Oct 20, 2014
e9c1afa
[SPARK-3800][SQL] Clean aliases from grouping expressions
marmbrus Oct 20, 2014
364d52b
[SPARK-3966][SQL] Fix nullabilities of Cast related to DateType.
ueshin Oct 20, 2014
fce1d41
[SPARK-3945]Properties of hive-site.xml is invalid in running the Thr…
luogankun Oct 20, 2014
7586e2e
[SPARK-3969][SQL] Optimizer should have a super class as an interface.
ueshin Oct 21, 2014
0fe1c09
[SPARK-3940][SQL] Avoid console printing error messages three times
wangxiaojing Oct 21, 2014
342b57d
Update Building Spark link.
rxin Oct 21, 2014
5a8f64f
[SPARK-3958] TorrentBroadcast cleanup / debugging improvements.
JoshRosen Oct 21, 2014
8570816
[SPARK-4023] [MLlib] [PySpark] convert rdd into RDD of Vector
Oct 21, 2014
2aeb84b
replace awaitTransformation with awaitTermination in scaladoc/javadoc
holdenk Oct 21, 2014
c262cd5
[SPARK-4035] Fix a wrong format specifier
zsxwing Oct 21, 2014
61ca774
[SPARK-4020] Do not rely on timeouts to remove failed block managers
andrewor14 Oct 21, 2014
1a623b2
SPARK-3770: Make userFeatures accessible from python
Oct 21, 2014
5fdaf52
[SPARK-3994] Use standard Aggregator code path for countByKey and cou…
aarondav Oct 21, 2014
814a9cd
SPARK-3568 [mllib] add ranking metrics
coderxiang Oct 21, 2014
856b081
[SQL]redundant methods for broadcast
scwf Oct 21, 2014
6bb56fa
SPARK-1813. Add a utility to SparkConf that makes using Kryo really easy
sryza Oct 22, 2014
bae4ca3
Update JavaCustomReceiver.java
Oct 22, 2014
f05e09b
use isRunningLocally rather than runningLocally
CrazyJvm Oct 22, 2014
97cf19f
Fix for sampling error in NumPy v1.9 [SPARK-3995][PYSPARK]
freeman-lab Oct 22, 2014
813effc
[SPARK-3426] Fix sort-based shuffle error when spark.shuffle.compress…
JoshRosen Oct 22, 2014
137d942
[SPARK-3877][YARN] Throw an exception when application is not success…
zsxwing Oct 22, 2014
c5882c6
[SPARK-3812] [BUILD] Adapt maven build to publish effective pom.
ScrapCodes Oct 23, 2014
d6a3025
[BUILD] Fixed resolver for scalastyle plugin and upgrade sbt version.
ScrapCodes Oct 23, 2014
f799700
[SPARK-4055][MLlib] Inconsistent spelling 'MLlib' and 'MLLib'
sarutak Oct 23, 2014
6b48522
[SPARK-4006] In long running contexts, we encountered the situation o…
tsliwowicz Oct 23, 2014
293672c
specify unidocGenjavadocVersion of 0.8
holdenk Oct 23, 2014
222fa47
Revert "[SPARK-3812] [BUILD] Adapt maven build to publish effective p…
pwendell Oct 23, 2014
83b7a1c
[SPARK-4019] [SPARK-3740] Fix MapStatus compression bug that could le…
JoshRosen Oct 23, 2014
e595c8d
[SPARK-3993] [PySpark] fix bug while reuse worker after take()
davies Oct 24, 2014
a29c9bd
[SPARK-4000][BUILD] Sends archived unit tests logs to Jenkins master
liancheng Oct 24, 2014
0aea228
SPARK-3812 Build changes to publish effective pom.
ScrapCodes Oct 24, 2014
809c785
[SPARK-2652] [PySpark] donot use KyroSerializer as default serializer
Oct 24, 2014
d2987e8
[SPARK-3900][YARN] ApplicationMaster's shutdown hook fails and Illega…
sarutak Oct 24, 2014
d60a9d4
[SPARK-4051] [SQL] [PySpark] Convert Row into dictionary
Oct 24, 2014
0e88661
[SPARK-4050][SQL] Fix caching of temporary tables with projections.
marmbrus Oct 24, 2014
7c89a8f
[SPARK-2706][SQL] Enable Spark to support Hive 0.13
zhzhan Oct 24, 2014
6a40a76
[SPARK-4026][Streaming] Write ahead log management
harishreedharan Oct 24, 2014
7aacb7b
[SPARK-2713] Executors of same application in same host should only d…
li-zhihui Oct 24, 2014
30ea286
[SPARK-4076] Parameter expansion in spark-config is wrong
sarutak Oct 24, 2014
098f83c
[SPARK-4075] [Deploy] Jar url validation is not enough for Jar file
sarutak Oct 24, 2014
b563987
[SPARK-4013] Do not create multiple actor systems on each executor
andrewor14 Oct 24, 2014
f80dcf2
[SPARK-4067] refactor ExecutorUncaughtExceptionHandler
Oct 24, 2014
07e439b
[GraphX] Modify option name according to example doc in SynthBenchmark
GraceH Oct 24, 2014
3a906c6
[HOTFIX][SQL] Remove sleep on reset() failure.
marmbrus Oct 24, 2014
6c98c29
[SPARK-4080] Only throw IOException from [write|read][Object|External]
JoshRosen Oct 24, 2014
898b22a
[SPARK-4056] Upgrade snappy-java to 1.1.1.5
JoshRosen Oct 25, 2014
3a845d3
[SQL] Update Hive test harness for Hive 12 and 13
marmbrus Oct 25, 2014
9530316
[SPARK-2321] Stable pull-based progress / status API
JoshRosen Oct 25, 2014
e41786c
[SPARK-4088] [PySpark] Python worker should exit after socket is clos…
Oct 25, 2014
2e52e4f
Revert "[SPARK-4056] Upgrade snappy-java to 1.1.1.5"
JoshRosen Oct 26, 2014
c683444
[SPARK-4071] Unroll fails silently if BlockManager is small
Oct 26, 2014
df7974b
SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work with Java 8
srowen Oct 26, 2014
b759540
Update RoaringBitmap to 0.4.3
lemire Oct 26, 2014
bf589fc
[SPARK-3616] Add basic Selenium tests to WebUISuite
JoshRosen Oct 26, 2014
677852c
Just fixing comment that shows usage
AtlasPilotPuppy Oct 26, 2014
0af7e51
[SPARK-3925][SQL] Do not consider the ordering of qualifiers during c…
viirya Oct 26, 2014
879a165
[HOTFIX][SQL] Temporarily turn off hive-server tests.
marmbrus Oct 26, 2014
2838bf8
[SPARK-3537][SPARK-3914][SQL] Refines in-memory columnar table statis…
liancheng Oct 26, 2014
89e8a5d
[SPARK-3997][Build]scalastyle should output the error location
witgo Oct 26, 2014
dc51f4d
[SQL][DOC] Wrong package name "scala.math.sql" in sql-programming-gui…
sarutak Oct 26, 2014
d518bc2
[SPARK-3953][SQL][Minor] Confusable variable name.
sarutak Oct 26, 2014
0530842
[SPARK-4052][SQL] Use scala.collection.Map for pattern matching inste…
yhuai Oct 26, 2014
0481aaa
[SPARK-4068][SQL] NPE in jsonRDD schema inference
yhuai Oct 26, 2014
974d7b2
[SPARK-3483][SQL] Special chars in column names
ravipesala Oct 26, 2014
ace41e8
[SPARK-3959][SPARK-3960][SQL] SqlParser fails to parse literal -92233…
sarutak Oct 26, 2014
3a9d66c
[SPARK-4061][SQL] We cannot use EOL character in the operand of LIKE …
sarutak Oct 26, 2014
f4e8c28
[SPARK-4042][SQL] Append columns ids and names before broadcast
scwf Oct 26, 2014
6377ada
[SPARK-3970] Remove duplicate removal of local dirs
viirya Oct 27, 2014
9aa340a
[SPARK-4030] Make destroy public for broadcast variables
shivaram Oct 27, 2014
c9e05ca
[SPARK-4032] Deprecate YARN alpha support in Spark 1.2
ScrapCodes Oct 27, 2014
dea302d
SPARK-2621. Update task InputMetrics incrementally
sryza Oct 27, 2014
1d7bcc8
[SQL] Fixes caching related JoinSuite failure
liancheng Oct 27, 2014
bfa614b
SPARK-4022 [CORE] [MLLIB] Replace colt dependency (LGPL) with commons…
srowen Oct 27, 2014
7e3a1ad
[MLlib] SPARK-3987: add test case on objective value for NNLS
coderxiang Oct 28, 2014
418ad83
[SPARK-3911] [SQL] HiveSimpleUdf can not be optimized in constant fol…
chenghao-intel Oct 28, 2014
698a7ea
[SPARK-3816][SQL] Add table properties from storage handler to output…
alexoss68 Oct 28, 2014
89af6df
[SPARK-4041][SQL] Attributes names in table scan should converted to …
scwf Oct 28, 2014
27470d3
[SQL] Correct a variable name in JavaApplySchemaSuite.applySchemaToJSON
yhuai Oct 28, 2014
0c34fa5
[SPARK-3907][SQL] Add truncate table support
wangxiaojing Oct 28, 2014
7c0c26c
[SPARK-4064]NioBlockTransferService.fetchBlocks may cause spark to hang.
witgo Oct 28, 2014
4ceb048
fix broken links in README.md
ryan-williams Oct 28, 2014
46c6341
[SPARK-4107] Fix incorrect handling of read() and skip() return values
JoshRosen Oct 28, 2014
fae095b
[SPARK-3961] [MLlib] [PySpark] Python API for mllib.feature
Oct 28, 2014
47346cd
[SPARK-4116][YARN]Delete the abandoned log4j-spark-container.properties
WangTaoTheTonic Oct 28, 2014
e8813be
[SPARK-4095][YARN][Minor]extract val isLaunchingDriver in ClientBase
WangTaoTheTonic Oct 28, 2014
0ac52e3
[SPARK-4098][YARN]use appUIAddress instead of appUIHostPort in yarn-c…
WangTaoTheTonic Oct 28, 2014
7768a80
[SPARK-4031] Make torrent broadcast read blocks on use.
shivaram Oct 28, 2014
44d8b45
[SPARK-4110] Wrong comments about default settings in spark-daemon.sh
sarutak Oct 28, 2014
1ea3e3d
[SPARK-4096][YARN]let ApplicationMaster accept executor memory argume…
WangTaoTheTonic Oct 28, 2014
247c529
[SPARK-3657] yarn alpha YarnRMClientImpl throws NPE appMasterRequest.…
sarutak Oct 28, 2014
4d52cec
[SPARK-4089][Doc][Minor] The version number of Spark in _config.yaml …
sarutak Oct 28, 2014
2f254da
[SPARK-4065] Add check for IPython on Windows
msjgriffiths Oct 28, 2014
6c1b981
[SPARK-4058] [PySpark] Log file name is hard coded even though there …
sarutak Oct 28, 2014
5807cb4
[SPARK-3814][SQL] Support for Bitwise AND(&), OR(|) ,XOR(^), NOT(~) i…
ravipesala Oct 28, 2014
47a40f6
[SPARK-3988][SQL] add public API for date type
adrian-wang Oct 28, 2014
abcafcf
[Spark 3922] Refactor spark-core to use Utils.UTF_8
zsxwing Oct 28, 2014
4b55482
[SPARK-3343] [SQL] Add serde support for CTAS
chenghao-intel Oct 28, 2014
84e5da8
[SPARK-4084] Reuse sort key in Sorter
mengxr Oct 28, 2014
1536d70
[SPARK-4008] Fix "kryo with fold" in KryoSerializerSuite
zsxwing Oct 29, 2014
b5e79bf
[SPARK-3904] [SQL] add constant objectinspector support for udfs
chenghao-intel Oct 29, 2014
8c0bfd0
[SPARK-4133] [SQL] [PySpark] type conversionfor python udf
Oct 29, 2014
1559495
[FIX] disable benchmark code
mengxr Oct 29, 2014
51ce997
[SPARK-4129][MLlib] Performance tuning in MultivariateOnlineSummarizer
Oct 29, 2014
dff0155
[SPARK-3453] Netty-based BlockTransferService, extracted from Spark core
rxin Oct 29, 2014
3535467
[SPARK-4003] [SQL] add 3 types for java SQL context
adrian-wang Oct 29, 2014
1df05a4
[SPARK-3822] Executor scaling mechanism for Yarn
Oct 29, 2014
e7fd804
[SPARK-4097] Fix the race condition of 'thread'
zsxwing Oct 29, 2014
8d59b37
[SPARK-3795] Heuristics for dynamically scaling executors
andrewor14 Oct 30, 2014
1234258
[SPARK-4053][Streaming] Made the ReceiverSuite test more reliable, by…
tdas Oct 30, 2014
cd739bd
[SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH instead of -Djava.librar…
witgo Oct 30, 2014
6db3157
[SPARK-4102] Remove unused ShuffleReader.stop() method.
kayousterhout Oct 30, 2014
c7ad085
[SPARK-4130][MLlib] Fixing libSVM parser bug with extra whitespace
jegonzal Oct 30, 2014
d932719
SPARK-4111 [MLlib] add regression metrics
Oct 30, 2014
234de92
[SPARK-4028][Streaming] ReceivedBlockHandler interface to abstract th…
tdas Oct 30, 2014
fb1fbca
[SPARK-4027][Streaming] WriteAheadLogBackedBlockRDD to read received …
tdas Oct 30, 2014
9142c9b
[SPARK-4078] New FsPermission instance w/o FsPermission.createImmutab…
GraceH Oct 30, 2014
24c5129
[SPARK-3319] [SPARK-3338] Resolve Spark submit config paths
andrewor14 Oct 30, 2014
26f092d
[SPARK-4138][SPARK-4139] Improve dynamic allocation settings
Oct 30, 2014
5231a3f
[Minor] A few typos in comments and log messages
andrewor14 Oct 30, 2014
9334d69
[SPARK-4155] Consolidate usages of <driver>
Oct 30, 2014
849b43e
Minor style hot fix after #2711
Oct 30, 2014
d345057
[SPARK-4153][WebUI] Update the sort keys for HistoryPage
zsxwing Oct 30, 2014
2f54543
[SPARK-3661] Respect spark.*.memory in cluster mode
Oct 30, 2014
68cb69d
SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce}Util should not use pa…
srowen Oct 30, 2014
9b6ebe3
[SPARK-4120][SQL] Join of multiple tables with syntax like SELECT .. …
ravipesala Oct 31, 2014
2e35e24
[SPARK-3968][SQL] Use parquet-mr filter2 api
Oct 31, 2014
26d31d1
Revert "SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce}Util should no…
Oct 31, 2014
0734d09
HOTFIX: Clean up build in network module.
pwendell Oct 31, 2014
872fc66
[SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python…
Oct 31, 2014
ad3bd0d
[SPARK-3250] Implement Gap Sampling optimization for random sampling
erikerlandson Oct 31, 2014
d31517a
[SPARK-4108][SQL] Fixed usage of deprecated in sql/catalyst/types/dat…
AtlasPilotPuppy Oct 31, 2014
58a6077
[SPARK-4143] [SQL] Move inner class DeferredObjectAdapter to top level
chenghao-intel Oct 31, 2014
acd4ac7
SPARK-3837. Warn when YARN kills containers for exceeding memory limits
sryza Oct 31, 2014
adb6415
[SPARK-4016] Allow user to show/hide UI metrics.
kayousterhout Oct 31, 2014
7c41d13
[SPARK-3826][SQL]enable hive-thriftserver to support hive-0.13.1
scwf Oct 31, 2014
fa712b3
[SPARK-4077][SQL] Spark SQL return wrong values for valid string time…
gvramana Oct 31, 2014
ea465af
[SPARK-4154][SQL] Query does not work if it has "not between " in Spa…
ravipesala Oct 31, 2014
23468e7
[SPARK-2220][SQL] Fixes remaining Hive commands
liancheng Oct 31, 2014
a68ecf3
[SPARK-4141] Hide Accumulators column on stage page when no accumulat…
mmm Oct 31, 2014
f1e7361
[SPARK-4150][PySpark] return self in rdd.setName
mengxr Oct 31, 2014
55ab777
[SPARK-3870] EOL character enforcement
sarutak Oct 31, 2014
087e31a
[HOT FIX] Yarn stable tests don't compile
Oct 31, 2014
23f73f5
SPARK-4175. Exception on stage page
sryza Nov 1, 2014
62d01d2
[MLLIB] SPARK-2329 Add multi-label evaluation metrics
avulanov Nov 1, 2014
e07fb6a
[SPARK-3838][examples][mllib][python] Word2Vec example in python
AtlasPilotPuppy Nov 1, 2014
8602195
[MLLIB] SPARK-1547: Add Gradient Boosting to MLlib
manishamde Nov 1, 2014
98c556e
Streaming KMeans [MLLIB][SPARK-3254]
freeman-lab Nov 1, 2014
680fd87
Upgrading to roaring 0.4.5 (bug fix release)
lemire Nov 1, 2014
f4e0b28
[SPARK-4142][GraphX] Default numEdgePartitions
jegonzal Nov 1, 2014
ee29ef3
[SPARK-4115][GraphX] Add overrided count for edge counting of EdgeRDD.
luluorta Nov 1, 2014
7136719
[SPARK-2759][CORE] Generic Binary File Support in Spark
kmader Nov 1, 2014
59e626c
[SPARK-4183] Enable NettyBlockTransferService by default
aarondav Nov 1, 2014
1d4f355
[SPARK-3569][SQL] Add metadata field to StructField
mengxr Nov 1, 2014
f55218a
[SPARK-3796] Create external service which can serve shuffle files
aarondav Nov 1, 2014
ad0fde1
[SPARK-4037][SQL] Removes the SessionState instance created in HiveTh…
liancheng Nov 1, 2014
7894de2
Revert "[SPARK-4183] Enable NettyBlockTransferService by default"
pwendell Nov 1, 2014
d8176b1
[SPARK-4121] Set commons-math3 version based on hadoop profiles, inst…
mengxr Nov 1, 2014
56f2c61
[SPARK-3161][MLLIB] Adding a node Id caching mechanism for training d…
Nov 1, 2014
23f966f
[SPARK-3930] [SPARK-3933] Support fixed-precision decimal in SQL, and…
mateiz Nov 2, 2014
105c5a3
Adding UserDefinedType to SQL, not done yet.
jkbradley Oct 3, 2014
0eaeb81
Still working on UDTs
jkbradley Oct 6, 2014
19b2f60
still working on UDTs
jkbradley Oct 6, 2014
982c035
still working on UDTs
jkbradley Oct 7, 2014
53de70f
more udts...
jkbradley Oct 7, 2014
8bebf24
commented out convertRowToScala for debugging
jkbradley Oct 7, 2014
273ac96
basic UDT is working, but deserialization has yet to be done
jkbradley Oct 8, 2014
39f8707
removed old udt suite
jkbradley Oct 8, 2014
04303c9
udts
jkbradley Oct 9, 2014
50f9726
udts
jkbradley Oct 9, 2014
893ee4c
udt finallly working
jkbradley Oct 9, 2014
964b32e
some cleanups
jkbradley Oct 9, 2014
fea04af
more cleanups
jkbradley Oct 9, 2014
b226b9e
Changing UDT to annotation
jkbradley Oct 10, 2014
3579035
udt annotation now working
jkbradley Oct 10, 2014
2f40c02
renamed UDT types
jkbradley Oct 10, 2014
e1f7b9c
blah
jkbradley Oct 10, 2014
34a5831
Added MLlib dependency on SQL.
jkbradley Oct 10, 2014
cd60cb4
Trying to get other SQL tests to run
jkbradley Oct 21, 2014
dff99d6
Added UDTs for Vectors in MLlib, plus DatasetExample using the UDTs
jkbradley Oct 22, 2014
85872f6
Allow schema calculation to be lazy, but ensure its available on exec…
marmbrus Oct 23, 2014
f025035
Cleanups before PR. Added new tests
jkbradley Oct 24, 2014
51e5282
fixed 1 test
jkbradley Oct 24, 2014
63626a4
Updated ScalaReflectionsSuite per @marmbrus suggestions
jkbradley Oct 24, 2014
759af7a
Added more doc to UserDefineType
jkbradley Oct 27, 2014
db16139
Added more doc for UserDefinedType. Removed unused code in Suite
jkbradley Oct 28, 2014
cfbc321
support UDT in parquet
mengxr Oct 28, 2014
3143ac3
remove unnecessary changes
mengxr Oct 28, 2014
87264a5
remove debug code
mengxr Oct 28, 2014
4500d8a
update example code
mengxr Oct 28, 2014
b028675
allow any type in UDT
mengxr Oct 28, 2014
7f29656
Moved udt case to top of all matches. Small cleanups
jkbradley Oct 28, 2014
8b242ea
Fixed merge error after last merge. Note: Last merge commit also rem…
jkbradley Oct 29, 2014
8de957c
Modified UserDefinedType to store Java class of user type so that reg…
jkbradley Oct 30, 2014
fa86b20
Removed Java UserDefinedType, and made UDTs private[spark] for now
jkbradley Oct 31, 2014
20630bc
fixed scalastyle
jkbradley Oct 31, 2014
6fddc1c
Made MyLabeledPoint into a Java Bean
jkbradley Oct 31, 2014
a571bb6
Removed old UDT code (registry and Java UDTs). Cleaned up other code…
jkbradley Oct 31, 2014
d063380
Cleaned up Java UDT Suite, and added warning about element ordering w…
jkbradley Oct 31, 2014
30ce5b2
updates based on code review
jkbradley Nov 2, 2014
5817b2b
style edits
jkbradley Nov 2, 2014
e13cd8a
Removed Vector UDTs
jkbradley Nov 2, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-2706][SQL] Enable Spark to support Hive 0.13
Given that a lot of users are trying to use hive 0.13 in spark, and the incompatibility between hive-0.12 and hive-0.13 on the API level I want to propose following approach, which has no or minimum impact on existing hive-0.12 support, but be able to jumpstart the development of hive-0.13 and future version support.

Approach: Introduce “hive-version” property,  and manipulate pom.xml files to support different hive version at compiling time through shim layer, e.g., hive-0.12.0 and hive-0.13.1. More specifically,

1. For each different hive version, there is a very light layer of shim code to handle API differences, sitting in sql/hive/hive-version, e.g., sql/hive/v0.12.0 or sql/hive/v0.13.1

2. Add a new profile hive-default active by default, which picks up all existing configuration and hive-0.12.0 shim (v0.12.0)  if no hive.version is specified.

3. If user specifies different version (currently only 0.13.1 by -Dhive.version = 0.13.1), hive-versions profile will be activated, which pick up hive-version specific shim layer and configuration, mainly the hive jars and hive-version shim, e.g., v0.13.1.

4. With this approach, nothing is changed with current hive-0.12 support.

No change by default: sbt/sbt -Phive
For example: sbt/sbt -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly

To enable hive-0.13: sbt/sbt -Dhive.version=0.13.1
For example: sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly

Note that in hive-0.13, hive-thriftserver is not enabled, which should be fixed by other Jira, and we don’t need -Phive with -Dhive.version in building (probably we should use -Phive -Dhive.version=xxx instead after thrift server is also supported in hive-0.13.1).

Author: Zhan Zhang <[email protected]>
Author: zhzhan <[email protected]>
Author: Patrick Wendell <[email protected]>

Closes apache#2241 from zhzhan/spark-2706 and squashes the following commits:

3ece905 [Zhan Zhang] minor fix
410b668 [Zhan Zhang] solve review comments
cbb4691 [Zhan Zhang] change run-test for new options
0d4d2ed [Zhan Zhang] rebase
497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
8fad1cf [Zhan Zhang] change the pom file and make hive-0.13.1 as the default
ab028d1 [Zhan Zhang] rebase
4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
4cb1b93 [zhzhan] Merge pull request #1 from pwendell/pr-2241
b0478c0 [Patrick Wendell] Changes to simplify the build of SPARK-2706
2b50502 [Zhan Zhang] rebase
a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
cb22863 [Zhan Zhang] correct the typo
20f6cf7 [Zhan Zhang] solve compatability issue
f7912a9 [Zhan Zhang] rebase and solve review feedback
301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
10c3565 [Zhan Zhang] address review comments
6bc9204 [Zhan Zhang] rebase and remove temparory repo
d3aa3f2 [Zhan Zhang] Merge branch 'master' into spark-2706
cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
3ced0d7 [Zhan Zhang] rebase
d9b981d [Zhan Zhang] rebase and fix error due to rollback
adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
3dd50e8 [Zhan Zhang] solve conflicts and remove unnecessary implicts
d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
dc7bdb3 [Zhan Zhang] solve conflicts
7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
d7c3e1e [Zhan Zhang] Merge branch 'master' into spark-2706
68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
d48bd18 [Zhan Zhang] address review comments
3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
57ea52e [Zhan Zhang] Merge branch 'master' into spark-2706
2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
9412d24 [Zhan Zhang] address review comments
f4af934 [Zhan Zhang] rebase
1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
128b60b [Zhan Zhang] ignore 0.12.0 test cases for the time being
af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
5f5619f [Zhan Zhang] restructure the directory and different hive version support
05d3683 [Zhan Zhang] solve conflicts
e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
94b4fdc [Zhan Zhang] Spark-2706: hive-0.13.1 support on spark
87ebf3b [Zhan Zhang] Merge branch 'master' into spark-2706
921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
f896b2a [Zhan Zhang] Merge branch 'master' into spark-2706
789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
f6a8a40 [Zhan Zhang] revert
ba14f28 [Zhan Zhang] test
dbedff3 [Zhan Zhang] Merge remote-tracking branch 'upstream/master'
70964fe [Zhan Zhang] revert
fe0f379 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark
70ffd93 [Zhan Zhang] revert
42585ec [Zhan Zhang] test
7d5fce2 [Zhan Zhang] test
  • Loading branch information
zhzhan authored and marmbrus committed Oct 24, 2014
commit 7c89a8f0c81ecf91dba34c1f44393f45845d438c
6 changes: 6 additions & 0 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,12 @@
<artifactId>spark-hive_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
<!-- TODO: Move this to "hive" profile once 0.13 JDBC is supported -->
<id>hive-0.12.0</id>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive-thriftserver_${scala.binary.version}</artifactId>
Expand Down
4 changes: 2 additions & 2 deletions dev/run-tests
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ CURRENT_BLOCK=$BLOCK_BUILD

{
# We always build with Hive because the PySpark Spark SQL tests need it.
BUILD_MVN_PROFILE_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive"
BUILD_MVN_PROFILE_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-0.12.0"

echo "[info] Building Spark with these arguments: $BUILD_MVN_PROFILE_ARGS"

Expand All @@ -167,7 +167,7 @@ CURRENT_BLOCK=$BLOCK_SPARK_UNIT_TESTS
# If the Spark SQL tests are enabled, run the tests with the Hive profiles enabled.
# This must be a single argument, as it is.
if [ -n "$_RUN_SQL_TESTS" ]; then
SBT_MAVEN_PROFILES_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive"
SBT_MAVEN_PROFILES_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-0.12.0"
fi

if [ -n "$_SQL_TESTS_ONLY" ]; then
Expand Down
26 changes: 17 additions & 9 deletions docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,12 +97,20 @@ mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
mvn -Pyarn-alpha -Phadoop-2.3 -Dhadoop.version=2.3.0 -Dyarn.version=0.23.7 -DskipTests clean package
{% endhighlight %}

<!--- TODO: Update this when Hive 0.13 JDBC is added -->

# Building With Hive and JDBC Support
To enable Hive integration for Spark SQL along with its JDBC server and CLI,
add the `-Phive` profile to your existing build options.
add the `-Phive` profile to your existing build options. By default Spark
will build with Hive 0.13.1 bindings. You can also build for Hive 0.12.0 using
the `-Phive-0.12.0` profile. NOTE: currently the JDBC server is only
supported for Hive 0.12.0.
{% highlight bash %}
# Apache Hadoop 2.4.X with Hive support
# Apache Hadoop 2.4.X with Hive 13 support
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package

# Apache Hadoop 2.4.X with Hive 12 support
mvn -Pyarn -Phive-0.12.0 -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
{% endhighlight %}

# Spark Tests in Maven
Expand All @@ -111,8 +119,8 @@ Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.o

Some of the tests require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time. The following is an example of a correct (build, test) sequence:

mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive clean package
mvn -Pyarn -Phadoop-2.3 -Phive test
mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-0.12.0 clean package
mvn -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 test

The ScalaTest plugin also supports running only a specific test suite as follows:

Expand Down Expand Up @@ -175,21 +183,21 @@ can be set to control the SBT build. For example:

Some of the tests require Spark to be packaged first, so always run `sbt/sbt assembly` the first time. The following is an example of a correct (build, test) sequence:

sbt/sbt -Pyarn -Phadoop-2.3 -Phive assembly
sbt/sbt -Pyarn -Phadoop-2.3 -Phive test
sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 assembly
sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 test

To run only a specific test suite as follows:

sbt/sbt -Pyarn -Phadoop-2.3 -Phive "test-only org.apache.spark.repl.ReplSuite"
sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 "test-only org.apache.spark.repl.ReplSuite"

To run test suites of a specific sub project as follows:

sbt/sbt -Pyarn -Phadoop-2.3 -Phive core/test
sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 core/test

# Speeding up Compilation with Zinc

[Zinc](https://github.com/typesafehub/zinc) is a long-running server version of SBT's incremental
compiler. When run locally as a background process, it speeds up builds of Scala-based projects
like Spark. Developers who regularly recompile Spark with Maven will be the most interested in
Zinc. The project site gives instructions for building and running `zinc`; OS X users can
install it using `brew install zinc`.
install it using `brew install zinc`.
29 changes: 24 additions & 5 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,11 @@
<hbase.version>0.94.6</hbase.version>
<flume.version>1.4.0</flume.version>
<zookeeper.version>3.4.5</zookeeper.version>
<hive.version>0.12.0-protobuf-2.5</hive.version>
<!-- Version used in Maven Hive dependency -->
<hive.version>0.13.1</hive.version>
<!-- Version used for internal directory structure -->
<hive.version.short>0.13.1</hive.version.short>
<derby.version>10.10.1.1</derby.version>
<parquet.version>1.4.3</parquet.version>
<jblas.version>1.2.3</jblas.version>
<jetty.version>8.1.14.v20131031</jetty.version>
Expand Down Expand Up @@ -456,7 +460,7 @@
<dependency>
<groupId>org.apache.derby</groupId>
<artifactId>derby</artifactId>
<version>10.4.2.0</version>
<version>${derby.version}</version>
</dependency>
<dependency>
<groupId>com.codahale.metrics</groupId>
Expand Down Expand Up @@ -1308,16 +1312,31 @@
</dependency>
</dependencies>
</profile>

<profile>
<id>hive</id>
<id>hive-0.12.0</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<!-- TODO: Move this to "hive" profile once 0.13 JDBC is supported -->
<modules>
<module>sql/hive-thriftserver</module>
</modules>
<properties>
<hive.version>0.12.0-protobuf-2.5</hive.version>
<hive.version.short>0.12.0</hive.version.short>
<derby.version>10.4.2.0</derby.version>
</properties>
</profile>
<profile>
<id>hive-0.13.1</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<hive.version>0.13.1</hive.version>
<hive.version.short>0.13.1</hive.version.short>
<derby.version>10.10.1.1</derby.version>
</properties>
</profile>

</profiles>
</project>
37 changes: 31 additions & 6 deletions sql/hive/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,6 @@
</properties>

<dependencies>
<dependency>
<groupId>com.twitter</groupId>
<artifactId>parquet-hive-bundle</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
Expand Down Expand Up @@ -116,7 +111,6 @@
<scope>test</scope>
</dependency>
</dependencies>

<profiles>
<profile>
<id>hive</id>
Expand Down Expand Up @@ -144,6 +138,19 @@
</plugins>
</build>
</profile>
<profile>
<id>hive-0.12.0</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>com.twitter</groupId>
<artifactId>parquet-hive-bundle</artifactId>
<version>1.5.0</version>
</dependency>
</dependencies>
</profile>
</profiles>

<build>
Expand All @@ -154,6 +161,24 @@
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<executions>
<execution>
<id>add-default-sources</id>
<phase>generate-sources</phase>
<goals>
<goal>add-source</goal>
</goals>
<configuration>
<sources>
<source>v${hive.version.short}/src/main/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>

<!-- Deploy datanucleus jars to the spark/lib_managed/jars directory -->
<plugin>
Expand Down
23 changes: 10 additions & 13 deletions sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ import org.apache.hadoop.hive.ql.Driver
import org.apache.hadoop.hive.ql.metadata.Table
import org.apache.hadoop.hive.ql.processors._
import org.apache.hadoop.hive.ql.session.SessionState
import org.apache.hadoop.hive.ql.stats.StatsSetupConst
import org.apache.hadoop.hive.serde2.io.TimestampWritable
import org.apache.hadoop.hive.serde2.io.DateWritable

Expand All @@ -47,6 +46,7 @@ import org.apache.spark.sql.execution.ExtractPythonUdfs
import org.apache.spark.sql.execution.QueryExecutionException
import org.apache.spark.sql.execution.{Command => PhysicalCommand}
import org.apache.spark.sql.hive.execution.DescribeHiveTableCommand
import org.apache.spark.sql.hive.HiveShim

/**
* DEPRECATED: Use HiveContext instead.
Expand Down Expand Up @@ -171,13 +171,15 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {

val tableParameters = relation.hiveQlTable.getParameters
val oldTotalSize =
Option(tableParameters.get(StatsSetupConst.TOTAL_SIZE)).map(_.toLong).getOrElse(0L)
Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize))
.map(_.toLong)
.getOrElse(0L)
val newTotalSize = getFileSizeForTable(hiveconf, relation.hiveQlTable)
// Update the Hive metastore if the total size of the table is different than the size
// recorded in the Hive metastore.
// This logic is based on org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
tableParameters.put(StatsSetupConst.TOTAL_SIZE, newTotalSize.toString)
tableParameters.put(HiveShim.getStatsSetupConstTotalSize, newTotalSize.toString)
val hiveTTable = relation.hiveQlTable.getTTable
hiveTTable.setParameters(tableParameters)
val tableFullName =
Expand Down Expand Up @@ -282,29 +284,24 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
*/
protected def runHive(cmd: String, maxRows: Int = 1000): Seq[String] = {
try {
// Session state must be initilized before the CommandProcessor is created .
SessionState.start(sessionState)

val cmd_trimmed: String = cmd.trim()
val tokens: Array[String] = cmd_trimmed.split("\\s+")
val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)

proc match {
case driver: Driver =>
driver.init()

val results = new JArrayList[String]
val results = HiveShim.createDriverResultsArray
val response: CommandProcessorResponse = driver.run(cmd)
// Throw an exception if there is an error in query processing.
if (response.getResponseCode != 0) {
driver.destroy()
driver.close()
throw new QueryExecutionException(response.getErrorMessage)
}
driver.setMaxRows(maxRows)
driver.getResults(results)
driver.destroy()
results
driver.close()
HiveShim.processResults(results)
case _ =>
sessionState.out.println(tokens(0) + " " + cmd_1)
Seq(proc.run(cmd_1).getResponseCode.toString)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import org.apache.hadoop.{io => hadoopIo}
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.types
import org.apache.spark.sql.catalyst.types._
import org.apache.spark.sql.hive.HiveShim

/* Implicit conversions */
import scala.collection.JavaConversions._
Expand Down Expand Up @@ -149,7 +150,7 @@ private[hive] trait HiveInspectors {
case l: Long => l: java.lang.Long
case l: Short => l: java.lang.Short
case l: Byte => l: java.lang.Byte
case b: BigDecimal => new HiveDecimal(b.underlying())
case b: BigDecimal => HiveShim.createDecimal(b.underlying())
case b: Array[Byte] => b
case d: java.sql.Date => d
case t: java.sql.Timestamp => t
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ import scala.util.parsing.combinator.RegexParsers
import org.apache.hadoop.hive.metastore.api.{FieldSchema, SerDeInfo, StorageDescriptor, Partition => TPartition, Table => TTable}
import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table}
import org.apache.hadoop.hive.ql.plan.TableDesc
import org.apache.hadoop.hive.ql.stats.StatsSetupConst
import org.apache.hadoop.hive.serde2.Deserializer

import org.apache.spark.Logging
Expand All @@ -34,6 +33,7 @@ import org.apache.spark.sql.catalyst.plans.logical
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.rules._
import org.apache.spark.sql.catalyst.types._
import org.apache.spark.sql.hive.HiveShim
import org.apache.spark.util.Utils

/* Implicit conversions */
Expand All @@ -56,7 +56,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
val table = client.getTable(databaseName, tblName)
val partitions: Seq[Partition] =
if (table.isPartitioned) {
client.getAllPartitionsForPruner(table).toSeq
HiveShim.getAllPartitionsOf(client, table).toSeq
} else {
Nil
}
Expand Down Expand Up @@ -185,7 +185,7 @@ object HiveMetastoreTypes extends RegexParsers {
"bigint" ^^^ LongType |
"binary" ^^^ BinaryType |
"boolean" ^^^ BooleanType |
"decimal" ^^^ DecimalType |
HiveShim.metastoreDecimal ^^^ DecimalType |
"date" ^^^ DateType |
"timestamp" ^^^ TimestampType |
"varchar\\((\\d+)\\)".r ^^^ StringType
Expand Down Expand Up @@ -272,13 +272,13 @@ private[hive] case class MetastoreRelation
// of RPCs are involved. Besides `totalSize`, there are also `numFiles`, `numRows`,
// `rawDataSize` keys (see StatsSetupConst in Hive) that we can look at in the future.
BigInt(
Option(hiveQlTable.getParameters.get(StatsSetupConst.TOTAL_SIZE))
Option(hiveQlTable.getParameters.get(HiveShim.getStatsSetupConstTotalSize))
.map(_.toLong)
.getOrElse(sqlContext.defaultSizeInBytes))
}
)

val tableDesc = new TableDesc(
val tableDesc = HiveShim.getTableDesc(
Class.forName(
hiveQlTable.getSerializationLib,
true,
Expand Down
16 changes: 14 additions & 2 deletions sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@
package org.apache.spark.sql.hive

import java.sql.Date

import org.apache.hadoop.hive.conf.HiveConf
import org.apache.hadoop.hive.ql.Context
import org.apache.hadoop.hive.ql.lib.Node
import org.apache.hadoop.hive.ql.parse._
import org.apache.hadoop.hive.ql.plan.PlanUtils
Expand Down Expand Up @@ -216,7 +217,18 @@ private[hive] object HiveQl {
/**
* Returns the AST for the given SQL string.
*/
def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql))
def getAst(sql: String): ASTNode = {
/*
* Context has to be passed in hive0.13.1.
* Otherwise, there will be Null pointer exception,
* when retrieving properties form HiveConf.
*/
val hContext = new Context(new HiveConf())
val node = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql, hContext))
hContext.clear()
node
}


/** Returns a LogicalPlan for a given HiveQL string. */
def parseSql(sql: String): LogicalPlan = hqlParser(sql)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import org.apache.spark.SerializableWritable
import org.apache.spark.broadcast.Broadcast
import org.apache.spark.rdd.{EmptyRDD, HadoopRDD, RDD, UnionRDD}
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.hive.HiveShim

/**
* A trait for subclasses that handle table scans.
Expand Down Expand Up @@ -138,7 +139,7 @@ class HadoopTableReader(
filterOpt: Option[PathFilter]): RDD[Row] = {
val hivePartitionRDDs = partitionToDeserializer.map { case (partition, partDeserializer) =>
val partDesc = Utilities.getPartitionDesc(partition)
val partPath = partition.getPartitionPath
val partPath = HiveShim.getDataLocationPath(partition)
val inputPathStr = applyFilterIfNeeded(partPath, filterOpt)
val ifc = partDesc.getInputFileFormatClass
.asInstanceOf[java.lang.Class[InputFormat[Writable, Writable]]]
Expand Down
Loading