Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
2831 commits
Select commit Hold shift + click to select a range
767ea86
[SPARK-34225][CORE][FOLLOWUP] Replace Hadoop's Path with Utils.resolv…
sarutak Apr 15, 2021
71133e1
[SPARK-35070][SQL] TRANSFORM not support alias in inputs
AngersZhuuuu Apr 15, 2021
2cb962b
[MINOR][CORE] Correct the number of started fetch requests in log
Ngone51 Apr 15, 2021
9689c44
[SPARK-34995] Port/integrate Koalas remaining codes into PySpark
itholic Apr 15, 2021
637f593
Revert "[SPARK-34995] Port/integrate Koalas remaining codes into PySp…
HyukjinKwon Apr 15, 2021
ba92de0
[SPARK-34843][SQL][FOLLOWUP] Fix a test failure in OracleIntegrationS…
sarutak Apr 15, 2021
4aee19e
[SPARK-35032][PYTHON] Port Koalas Index unit tests into PySpark
xinrong-meng Apr 15, 2021
3f4c32b
[SPARK-35099][SQL] Convert ANSI interval literals to SQL string in AN…
MaxGekk Apr 16, 2021
345c380
[SPARK-35083][CORE] Support remote scheduler pool files
ulysses-you Apr 16, 2021
95db7e6
[SPARK-35104][SQL] Fix ugly indentation of multiple JSON records in a…
sarutak Apr 16, 2021
91bd384
[SPARK-34995] Port/integrate Koalas remaining codes into PySpark
itholic Apr 16, 2021
2e1e1f8
[MINOR][DOCS] Soften security warning and keep it in cluster manageme…
srowen Apr 17, 2021
94849af
[SPARK-34787][CORE] Option variable in Spark historyServer log should…
echohlne Apr 17, 2021
2bdb26b
[SPARK-35101][INFRA] Add GitHub status check in PR instead of a comment
HyukjinKwon Apr 18, 2021
7f6dee8
[MINOR][INFRA] Upgrade Jira client to 2.0.0
dongjoon-hyun Apr 18, 2021
03191e8
[SPARK-35116][SQL][TESTS] The generated data fits the precision of Da…
beliefer Apr 18, 2021
d04b467
[SPARK-35114][SQL][TESTS] Add checks for ANSI intervals to `LiteralEx…
MaxGekk Apr 18, 2021
12abfe7
[SPARK-34716][SQL] Support ANSI SQL intervals by the aggregate functi…
beliefer Apr 18, 2021
074f770
[SPARK-35115][SQL][TESTS] Check ANSI intervals in `MutableProjectionS…
MaxGekk Apr 18, 2021
978cd0b
[SPARK-35092][UI] the auto-generated rdd's name in the storage tab sh…
echohlne Apr 19, 2021
fd08c93
[SPARK-35109][SQL] Fix minor exception messages of HashedRelation and…
c21 Apr 19, 2021
c8d78a7
[SPARK-34581][SQL] Don't optimize out grouping expressions from aggre…
peter-toth Apr 19, 2021
7a06cdd
[SPARK-35122][SQL] Migrate CACHE/UNCACHE TABLE to use AnalysisOnlyCom…
imback82 Apr 19, 2021
a74f601
[SPARK-31937][SQL] Support processing ArrayType/MapType/StructType da…
AngersZhuuuu Apr 19, 2021
70b606f
[SPARK-35045][SQL][FOLLOW-UP] Add a configuration for CSV input buffe…
HyukjinKwon Apr 19, 2021
8dc455b
[SPARK-34837][SQL] Support ANSI SQL intervals by the aggregate functi…
beliefer Apr 19, 2021
1d1ed3e
[SPARK-35107][SQL] Parse unit-to-unit interval literals to ANSI inter…
MaxGekk Apr 19, 2021
7f34035
[SPARK-34715][SQL][TESTS] Add round trip tests for period <-> month a…
beliefer Apr 19, 2021
425dc58
[SPARK-35125][K8S] Upgrade K8s client to 5.3.0 to support K8s 1.20
dongjoon-hyun Apr 19, 2021
2d161cb
[SPARK-35102][SQL] Make spark.sql.hive.version read-only, not depreca…
yaooqinn Apr 19, 2021
d37d18d
[SPARK-35136] Remove initial null value of LiveStage.info
sander-goos Apr 19, 2021
0c2e9b9
[SPARK-35138][SQL] Remove Antlr4 workaround
pan3793 Apr 19, 2021
dc7d41e
[SPARK-35120][INFRA] Guide users to sync branch and enable GitHub Act…
HyukjinKwon Apr 19, 2021
00f06dd
[SPARK-35131][K8S] Support early driver service clean-up during app t…
dongjoon-hyun Apr 19, 2021
9a6d773
[SPARK-35103][SQL] Make TypeCoercion rules more efficient
sigmod Apr 19, 2021
e55ff83
[SPARK-35117][UI] Change progress bar back to highlight ratio of task…
Kimahriman Apr 20, 2021
bad4b6f
[SPARK-35080][SQL] Only allow a subset of correlated equality predica…
allisonwang-db Apr 20, 2021
f4926d1
[SPARK-35052][SQL] Use static bits for AttributeReference and Literal
sigmod Apr 20, 2021
670c365
[SPARK-35134][BUILD][TESTS] Manually exclude redundant netty jars in …
LuciferYang Apr 20, 2021
aa0d00d
[SPARK-35018][SQL][TESTS] Check transferring of year-month intervals …
MaxGekk Apr 20, 2021
b6bb24c
[SPARK-34974][SQL] Improve subquery decorrelation framework
allisonwang-db Apr 20, 2021
b219e37
[SPARK-35068][SQL] Add tests for ANSI intervals to HiveThriftBinarySe…
AngersZhuuuu Apr 20, 2021
9c956ab
[SPARK-33976][SQL][DOCS] Add a SQL doc page for a TRANSFORM clause
AngersZhuuuu Apr 20, 2021
1e64b4f
[SPARK-34877][CORE][YARN] Add the code change for adding the Spark AM…
SaurabhChawla100 Apr 20, 2021
3614448
[SPARK-34035][SQL] Refactor ScriptTransformation to remove input para…
AngersZhuuuu Apr 20, 2021
eb9a439
[SPARK-34338][SQL] Report metrics from Datasource v2 scan
viirya Apr 20, 2021
e08c40f
[SPARK-35145][SQL] CurrentOrigin should support nested invoking
cloud-fan Apr 20, 2021
83f753e
[SPARK-34472][YARN] Ship ivySettings file to driver in cluster mode
shardulm94 Apr 20, 2021
e8d6992
[SPARK-35153][SQL] Make textual representation of ANSI interval opera…
MaxGekk Apr 20, 2021
c7e18ad
[SPARK-35132][BUILD][CORE] Upgrade netty-all to 4.1.63.Final
LuciferYang Apr 20, 2021
81c3cc2
[SPARK-35044][SQL][FOLLOWUP][TEST-HADOOP2.7] Fix hadoop 2.7 test due …
yaooqinn Apr 21, 2021
d259f93
[SPARK-35113][SQL] Support ANSI intervals in the Hash expression
AngersZhuuuu Apr 21, 2021
97ec57e
[SPARK-35120][INFRA][FOLLOW-UP] Try catch an error to show the correc…
HyukjinKwon Apr 21, 2021
4f309ce
[SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitiv…
sandeep-katta Apr 21, 2021
43ad939
[SPARK-35152][SQL] ANSI mode: IntegralDivide throws exception on over…
gengliangwang Apr 21, 2021
b6350f5
[SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictio…
harupy Apr 21, 2021
8e9e700
[SPARK-35171][R] Declare the markdown package as a dependency of the …
xuanyuanking Apr 21, 2021
355c399
[SPARK-35140][INFRA] Add error message guidelines to PR template
karenfeng Apr 21, 2021
81dbaed
[SPARK-34692][SQL] Support Not(Int) and Not(InSet) propagate null in …
ulysses-you Apr 21, 2021
e609395
[SPARK-34897][SQL] Support reconcile schemas based on index after nes…
wangyum Apr 21, 2021
6860efe
[SPARK-35178][BUILD] Use new Apache 'closer.lua' syntax to obtain Maven
srowen Apr 22, 2021
548e66c
[SPARK-34692][SQL][FOLLOWUP] Add INSET to ReplaceNullWithFalseInPredi…
viirya Apr 22, 2021
b17a0e6
[SPARK-34674][CORE][K8S] Close SparkContext after the Main method has…
kotlovs Apr 22, 2021
bb5459f
[SPARK-35177][SQL] Fix arithmetic overflow in parsing the minimal int…
AngersZhuuuu Apr 22, 2021
c0972de
[SPARK-35180][BUILD] Allow to build SparkR with SBT
sarutak Apr 22, 2021
7242d7f
[SPARK-35127][UI] When we switch between different stage-detail pages…
echohlne Apr 22, 2021
b22d54a
[SPARK-35026][SQL] Support nested CUBE/ROLLUP/GROUPING SETS in GROUPI…
AngersZhuuuu Apr 22, 2021
7f7a3d8
[SPARK-35183][SQL] Use transformAllExpressions in CombineConcats
sigmod Apr 22, 2021
6c587d2
[SPARK-35110][SQL] Handle ANSI intervals in WindowExecBase
beliefer Apr 22, 2021
04e2305
[SPARK-35187][SQL] Fix failure on the minimal interval literal
AngersZhuuuu Apr 22, 2021
4d2b559
[SPARK-34999][PYTHON] Consolidate PySpark testing utils
xinrong-meng Apr 22, 2021
6ab0048
[SPARK-35182][K8S] Support driver-owned on-demand PVC
dongjoon-hyun Apr 23, 2021
4fcbf59
[SPARK-35040][PYTHON] Remove Spark-version related codes from test codes
xinrong-meng Apr 23, 2021
47f8687
[SPARK-35075][SQL] Add traversal pruning for subquery related rules
sigmod Apr 23, 2021
86238d0
[SPARK-35195][SQL][TEST] Move InMemoryTable etc to org.apache.spark.s…
sunchao Apr 23, 2021
cab205e
[SPARK-35141][SQL] Support two level of hash maps for final hash aggr…
c21 Apr 23, 2021
7582dc8
[SPARK-35143][SQL][SHELL] Add default log level config for spark-sql
hddong Apr 23, 2021
20d68dc
[SPARK-35159][SQL][DOCS] Extract hive format doc
AngersZhuuuu Apr 23, 2021
fdccd88
Revert "[SPARK-34581][SQL] Don't optimize out grouping expressions fr…
cloud-fan Apr 23, 2021
9af338c
[SPARK-35078][SQL] Add tree traversal pruning in expression rules
sigmod Apr 23, 2021
e503b9c
[SPARK-35201][SQL] Format empty grouping set exception in CUBE/ROLLUP
AngersZhuuuu Apr 23, 2021
a9345a0
[SPARK-35204][SQL] CatalystTypeConverters of date/timestamp should ac…
cloud-fan Apr 23, 2021
b2a2b5d
[SPARK-34297][SQL][SS] Add metrics for data loss and offset out range…
viirya Apr 23, 2021
44c1387
[SPARK-35210][BUILD] Upgrade Jetty to 9.4.40 to fix ERR_CONNECTION_RE…
sarutak Apr 24, 2021
166cc62
[SPARK-34990][SQL][TESTS] Add ParquetEncryptionSuite
andersonm-ibm Apr 24, 2021
bcac733
[SPARK-35200][CORE] Avoid to recompute the pending speculative tasks …
weixiuli Apr 24, 2021
1f150b9
[SPARK-35024][ML] Refactor LinearSVC - support virtual centering
zhengruifeng Apr 25, 2021
b108e7f
[SPARK-33913][SS] Upgrade Kafka to 2.8.0
dongjoon-hyun Apr 25, 2021
5b1353f
[SPARK-35168][SQL] mapred.reduce.tasks should be shuffle.partitions n…
yaooqinn Apr 25, 2021
6f782ef
[SPARK-35220][SQL] DayTimeIntervalType/YearMonthIntervalType show dif…
AngersZhuuuu Apr 26, 2021
2d6467d
[SPARK-35087][UI] Some columns in table Aggregated Metrics by Executo…
echohlne Apr 26, 2021
38ef477
[SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-ba…
venkata91 Apr 26, 2021
d572a85
[SPARK-35224][SQL][TESTS] Fix buffer overflow in `MutableProjectionSu…
MaxGekk Apr 26, 2021
74afc68
[SPARK-35213][SQL] Keep the correct ordering of nested structs in cha…
Kimahriman Apr 26, 2021
c0a3c0c
[SPARK-35088][SQL] Accept ANSI intervals by the Sequence expression
beliefer Apr 26, 2021
84026d7
[SPARK-35223] Add IssueNavigationLink
pan3793 Apr 26, 2021
bdac191
[SPARK-35230][SQL] Move custom metric classes to proper package
viirya Apr 26, 2021
1db031f
[SPARK-35220][DOCS][FOLLOWUP] DayTimeIntervalType/YearMonthIntervalTy…
AngersZhuuuu Apr 26, 2021
c59988a
[SPARK-34638][SQL] Single field nested column prune on generator output
viirya Apr 26, 2021
f009046
[SPARK-33985][SQL][TESTS] Add query test of combine usage of TRANSFOR…
AngersZhuuuu Apr 26, 2021
1b609c7
[SPARK-35060][SQL] Group exception messages in sql/types
beliefer Apr 26, 2021
0df3b50
[SPARK-28247][SS][TEST] Fix flaky test "query without test harness" o…
zsxwing Apr 26, 2021
f738fe0
[SPARK-35227][BUILD] Update the resolver for spark-packages in SparkS…
bozhang2820 Apr 27, 2021
7779fce
[SPARK-35225][SQL] EXPLAIN command should handle empty output of anal…
imback82 Apr 27, 2021
7f51106
[SPARK-26164][SQL] Allow concurrent writers for writing dynamic parti…
c21 Apr 27, 2021
eb08b90
[SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors
Peng-Lei Apr 27, 2021
c4ad86f
[SPARK-35235][SQL][TEST] Add row-based hash map into aggregate benchmark
c21 Apr 27, 2021
2d2f467
[SPARK-35169][SQL] Fix wrong result of min ANSI interval division by -1
AngersZhuuuu Apr 27, 2021
55dea2d
[SPARK-34837][SQL][FOLLOWUP] Fix division by zero in the avg function…
beliefer Apr 27, 2021
4ff9f1f
[SPARK-35239][SQL] Coalesce shuffle partition should handle empty inp…
ulysses-you Apr 27, 2021
16d223e
[SPARK-35091][SPARK-35090][SQL] Support extract from ANSI Intervals
yaooqinn Apr 27, 2021
592230e
[MINOR][DOCS][ML] Explicit return type of array_to_vector utility fun…
jlafaye Apr 27, 2021
26a8d2f
[SPARK-35238][DOC] Add JindoFS SDK in cloud integration documents
adrian-wang Apr 27, 2021
5b77ebb
[SPARK-35150][ML] Accelerate fallback BLAS with dev.ludovic.netlib
luhenry Apr 27, 2021
0769049
[SPARK-34979][PYTHON][DOC] Add PyArrow installation note for PySpark …
Yikun Apr 28, 2021
abb1f0c
[SPARK-35236][SQL] Support archive files as resources for CREATE FUNC…
sarutak Apr 28, 2021
10c2b68
[SPARK-35244][SQL] Invoke should throw the original exception
cloud-fan Apr 28, 2021
253a1ae
[SPARK-35246][SS] Don't allow streaming-batch intersects
jose-torres Apr 28, 2021
046c8c3
[SPARK-34878][SQL][TESTS] Check actual sizes of year-month and day-ti…
Peng-Lei Apr 28, 2021
56bb815
[SPARK-35085][SQL] Get columns operation should handle ANSI interval …
beliefer Apr 28, 2021
26a5e33
[SPARK-33976][SQL][DOCS][FOLLOWUP] Fix syntax error in select doc page
AngersZhuuuu Apr 28, 2021
8b62c29
[SPARK-35214][SQL] OptimizeSkewedJoin support ShuffledHashJoinExec
ulysses-you Apr 28, 2021
0bcf348
[SPARK-34781][SQL][FOLLOWUP] Adjust the order of AQE optimizer rules
ulysses-you Apr 28, 2021
86d3bb5
[SPARK-34981][SQL] Implement V2 function resolution and evaluation
sunchao Apr 28, 2021
403e479
[SPARK-35244][SQL][FOLLOWUP] Add null check for the exception cause
cloud-fan Apr 29, 2021
74b9326
[SPARK-35135][CORE] Turn the `WritablePartitionedIterator` from a tra…
LuciferYang Apr 29, 2021
7713565
[SPARK-34786][SQL][FOLLOWUP] Explicitly declare DecimalType(20, 0) fo…
yaooqinn Apr 29, 2021
529b875
[SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources
sarutak Apr 29, 2021
132cbf0
[SPARK-35105][SQL] Support multiple paths for ADD FILE/JAR/ARCHIVE co…
sarutak Apr 29, 2021
068b6c8
[SPARK-35234][CORE] Reserve the format of stage failureMessage
Ngone51 Apr 29, 2021
7b78e34
[SPARK-35269][BUILD] Upgrade commons-lang3 to 3.12.0
LuciferYang Apr 29, 2021
4e3daa5
[SPARK-35254][BUILD] Upgrade SBT to 1.5.1
zhulipeng Apr 29, 2021
738cf7f
[SPARK-35009][CORE] Avoid creating multiple python worker monitor thr…
attilapiros Apr 29, 2021
8a5af37
[SPARK-35268][BUILD] Upgrade GenJavadoc to 0.17
sarutak Apr 29, 2021
e8bf8fe
[SPARK-35047][SQL] Allow Json datasources to write non-ascii characte…
sarutak Apr 29, 2021
77e9152
[SPARK-35255][BUILD] Automated formatting for Scala Code for Blank Lines
zhulipeng Apr 30, 2021
ac8813e
[SPARK-35277][BUILD] Upgrade snappy to 1.1.8.4
williamhyun Apr 30, 2021
11ea255
[SPARK-35111][SQL] Support Cast string to year-month interval
AngersZhuuuu Apr 30, 2021
39889df
[SPARK-35264][SQL] Support AQE side broadcastJoin threshold
ulysses-you Apr 30, 2021
4e8701a
[SPARK-35280][K8S] Promote KubernetesUtils to DeveloperApi
dongjoon-hyun Apr 30, 2021
72e238a
[SPARK-35273][SQL] CombineFilters support non-deterministic expressions
wangyum May 1, 2021
6ce1b16
[SPARK-35278][SQL] Invoke should find the method with correct number …
viirya May 1, 2021
cfc0495
[SPARK-34581][SQL] Don't optimize out grouping expressions from aggre…
peter-toth May 2, 2021
caa46ce
[SPARK-35112][SQL] Support Cast string to day-second interval
AngersZhuuuu May 2, 2021
cd689c9
[SPARK-35192][SQL][TESTS] Port minimal TPC-DS datagen code from datab…
maropu May 3, 2021
335f00b
[SPARK-35285][SQL] Parse ANSI interval types in SQL schema
MaxGekk May 3, 2021
2a8d7ed
[SPARK-35281][SQL] StaticInvoke should not apply boxing if return typ…
sunchao May 3, 2021
44b7931
[SPARK-35176][PYTHON] Standardize input validation error type
Yikun May 3, 2021
be6ecb6
[SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs whe…
May 3, 2021
54e0aa1
[MINOR][SS][DOCS] Fix a typo in the documentation of GroupState
Dobiasd May 3, 2021
8aaa9e8
[SPARK-35250][SQL][DOCS] Fix duplicated STOP_AT_DELIMITER to SKIP_VAL…
HyukjinKwon May 3, 2021
176218b
[SPARK-35292][PYTHON] Delete redundant parameter in mypy configuration
garawalid May 4, 2021
120c389
[SPARK-34887][PYTHON] Port Koalas dependencies into PySpark
xinrong-meng May 4, 2021
5ecb112
[SPARK-35300][PYTHON][DOCS] Standardize module names in install.rst
xinrong-meng May 4, 2021
a2927cb
[SPARK-35302][INFRA] Benchmark workflow should create new files for n…
HyukjinKwon May 4, 2021
9b387a1
[SPARK-35308][TESTS] Fix bug in SPARK-35266 that creates benchmark fi…
May 4, 2021
7fd3f8f
[SPARK-35294][SQL] Add tree traversal pruning in rules with dedicated…
sigmod May 4, 2021
f550e03
[SPARK-34794][SQL] Fix lambda variable name issues in nested DataFram…
May 5, 2021
bbdbe0f
[SPARK-34854][SQL][SS] Expose source metrics via progress report and …
yijiacui-db May 5, 2021
4fe4b65
[SPARK-35315][TESTS] Keep benchmark result consistent between spark-s…
sunchao May 5, 2021
7970318
[SPARK-35155][SQL] Add rule id pruning to Analyzer rules
sigmod May 6, 2021
0126924
[SPARK-35323][BUILD] Remove unused libraries from LICENSE-binary
dongjoon-hyun May 6, 2021
a0c76a8
[SPARK-35319][K8S][BUILD] Upgrade K8s client to 5.3.1
dongjoon-hyun May 6, 2021
19661f6
[SPARK-35325][SQL][TESTS] Add nested column ORC encryption test case
dongjoon-hyun May 6, 2021
5c67d0c
[SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite
maropu May 6, 2021
3f5a209
[SPARK-35318][SQL] Hide internal view properties for describe table cmd
linhongliu-db May 6, 2021
c6d3f37
[SPARK-35240][SS] Use CheckpointFileManager for checkpoint file manip…
viirya May 6, 2021
6cd5cf5
[SPARK-35215][SQL] Update custom metric per certain rows and at the e…
viirya May 6, 2021
dfb3343
[SPARK-34526][SS] Ignore the error when checking the path in FileStre…
xuanyuanking May 6, 2021
bb93547
[SPARK-35326][BUILD] Upgrade Jersey to 2.34
sarutak May 6, 2021
482b43d
[SPARK-35326][BUILD][FOLLOWUP] Update dependency manifest files
dongjoon-hyun May 6, 2021
e834ef7
[SPARK-35293][SQL][TESTS][FOLLOWUP] Update the hash key to refresh TP…
maropu May 6, 2021
94bbca3
[SPARK-35306][MLLIB][TESTS] Add benchmark results for BLASBenchmark c…
May 7, 2021
42f59ca
[SPARK-35133][SQL] Explain codegen works with AQE
c21 May 7, 2021
33c1034
[SPARK-34701][SQL][FOLLOW-UP] Children/innerChildren should be mutual…
imback82 May 7, 2021
e83910f
[SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which …
cloud-fan May 7, 2021
cf2c4ba
[SPARK-35020][SQL] Group exception messages in catalyst/util
beliefer May 7, 2021
9aa18df
[SPARK-35333][SQL] Skip object null check in Invoke if possible
cloud-fan May 7, 2021
72d3266
[SPARK-35144][SQL] Migrate to transformWithPruning for object rules
sigmod May 7, 2021
d3b92ee
[SPARK-35021][SQL] Group exception messages in connector/catalog
beliefer May 7, 2021
2634dba
[SPARK-35175][BUILD] Add linter for JavaScript source files
sarutak May 7, 2021
6f0ef93
[SPARK-35297][CORE][DOC][MINOR] Modify the comment about the executor
May 7, 2021
33fbf56
[SPARK-35288][SQL] StaticInvoke should find the method without exact …
viirya May 7, 2021
b4ec9e2
[SPARK-35321][SQL] Don't register Hive permanent functions when creat…
sunchao May 7, 2021
f47e0f8
[SPARK-35261][SQL] Support static magic method for stateless Java Sca…
sunchao May 8, 2021
323a6e8
[SPARK-35232][SQL] Nested column pruning should retain column metadata
sunchao May 8, 2021
b025780
[SPARK-35331][SQL] Support resolving missing attrs for distribute/clu…
yaooqinn May 8, 2021
06c4009
[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cau…
maropu May 8, 2021
e31bef1
Revert "[SPARK-35321][SQL] Don't register Hive permanent functions wh…
dongjoon-hyun May 8, 2021
5b65d8a
[SPARK-35347][SQL] Use MethodUtils for looking up methods in Invoke a…
viirya May 8, 2021
620f072
[SPARK-35231][SQL] logical.Range override maxRowsPerPartition
zhengruifeng May 9, 2021
38eb5a6
[SPARK-35354][SQL] Replace BaseJoinExec with ShuffledJoin in Coalesce…
c21 May 10, 2021
2c8ced9
[SPARK-35111][SPARK-35112][SQL][FOLLOWUP] Rename ANSI interval patter…
AngersZhuuuu May 10, 2021
245dce1
[SPARK-35261][SQL][TESTS][FOLLOW-UP] Change failOnError to false for …
sunchao May 10, 2021
20d3224
[SPARK-35358][BUILD] Increase maximum Java heap used for release buil…
viirya May 10, 2021
d808956
[MINOR][INFRA] Add python/.idea into git ignore
HyukjinKwon May 10, 2021
7182f8c
[SPARK-35360][SQL] RepairTableCommand respects `spark.sql.addPartitio…
AngersZhuuuu May 10, 2021
d2a535f
[SPARK-34246][FOLLOWUP] Change the definition of `findTightestCommonT…
gengliangwang May 10, 2021
8b94eff
[SPARK-34736][K8S][TESTS] Kubernetes and Minikube version upgrade for…
attilapiros May 10, 2021
44bd0a8
[SPARK-35088][SQL][FOLLOWUP] Improve the error message for Sequence e…
beliefer May 11, 2021
c4ca232
[SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to j…
c21 May 11, 2021
7c9a9ec
[SPARK-35146][SQL] Migrate to transformWithPruning or resolveWithPrun…
sigmod May 11, 2021
2b6640a
[SPARK-35229][WEBUI] Limit the maximum number of items on the timelin…
sarutak May 11, 2021
b59d5ab
[SPARK-35372][BUILD] Increase stack size for Scala compilation in Mav…
HyukjinKwon May 11, 2021
af0d99c
[SPARK-35375][INFRA] Use Jinja2 < 3.0.0 for Python linter dependency …
sarutak May 12, 2021
78221bd
[SPARK-35361][SQL] Improve performance for ApplyFunctionExpression
sunchao May 12, 2021
a189be8
[MINOR][DOCS] Avoid some python docs where first sentence has "e.g." …
srowen May 12, 2021
7e3446a
[SPARK-35377][INFRA] Add JS linter to GA
sarutak May 12, 2021
ecb48cc
[SPARK-35381][R] Fix lambda variable name issues in nested higher ord…
HyukjinKwon May 12, 2021
82c520a
[SPARK-35243][SQL] Support columnar execution on ANSI interval types
Peng-Lei May 12, 2021
d92018e
[SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optim…
sigmod May 12, 2021
ed05954
[SPARK-29145][SQL][FOLLOWUP] Clean up code about support sub-queries …
AngersZhuuuu May 12, 2021
402375b
[SPARK-35357][GRAPHX] Allow to turn off the normalization applied by …
ebonnal May 12, 2021
101b0cc
[SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4
maropu May 12, 2021
b52d47a
[SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludov…
luhenry May 12, 2021
7bcaded
[SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
c21 May 12, 2021
f156a95
[SPARK-35347][SQL][FOLLOWUP] Throw exception with an explicit excepti…
viirya May 12, 2021
dac6f17
[SPARK-35387][INFRA] Increase the JVM stack size for Java 11 build test
gengliangwang May 12, 2021
77b7fe1
[SPARK-35383][CORE] Improve s3a magic committer support by inferring …
dongjoon-hyun May 12, 2021
bc95c3a
[SPARK-35361][SQL][FOLLOWUP] Switch to use while loop
sunchao May 12, 2021
b3c916e
[SPARK-35013][CORE] Don't allow to set spark.driver.cores=0
shahidki31 May 12, 2021
ae0579a
[SPARK-35369][DOC] Document ExecutorAllocationManager metrics
LucaCanali May 12, 2021
3241aeb
[SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related…
maropu May 13, 2021
c0b52da
[SPARK-35388][INFRA] Allow the PR source branch to include slashes
ueshin May 13, 2021
0ab9bd7
[SPARK-35384][SQL] Improve performance for InvokeLike.invoke
sunchao May 13, 2021
17b59a9
[SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataF…
ueshin May 13, 2021
dd54649
[SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom …
dongjoon-hyun May 13, 2021
5181543
[SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
yaooqinn May 13, 2021
c1e995a
[SPARK-35350][SQL] Add code-gen for left semi sort merge join
c21 May 13, 2021
d1b8bd7
[SPARK-34720][SQL] MERGE ... UPDATE/INSERT * should do by-name resolu…
cloud-fan May 13, 2021
b6d57b6
[SPARK-34637][SQL] Support DPP + AQE when the broadcast exchange can …
JkSelf May 13, 2021
f7704ec
[SPARK-35392][ML][PYTHON] Fix flaky tests in ml/clustering.py and ml/…
zhengruifeng May 13, 2021
6c5fcac
[SPARK-35373][BUILD] Check Maven artifact checksum in build/mvn
srowen May 13, 2021
02c99f1
[SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
gengliangwang May 13, 2021
6f63057
[SPARK-35332][SQL] Make cache plan disable configs configurable
ulysses-you May 13, 2021
c2e15cc
[SPARK-35062][SQL] Group exception messages in sql/streaming
beliefer May 13, 2021
6aa2594
[SPARK-35366][SQL] Avoid using deprecated `buildForBatch` and `buildF…
linhongliu-db May 13, 2021
7d371d2
[SPARK-35393][PYTHON][INFRA][TESTS] Recover pip packaging test in Git…
HyukjinKwon May 13, 2021
6a949d1
[SPARK-35397][SQL] Replace sys.err usage with explicit exception type
viirya May 13, 2021
160b3be
[SPARK-34764][CORE][K8S][UI] Propagate reason for exec loss to Web UI
holdenk May 13, 2021
8fa739f
[SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec
maropu May 14, 2021
b6a0a7e
[SPARK-35311][SS][UI][DOCS] Structured Streaming Web UI state informa…
gaborgsomogyi May 14, 2021
f7af9ab
[SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments fo…
HyukjinKwon May 14, 2021
9ea55fe
[SPARK-35207][SQL] Normalize hash function behavior with negative zer…
planga82 May 14, 2021
d424771
[MINOR][DOC] ADD toc for monitoring page
yaooqinn May 14, 2021
6218bc5
[SPARK-35332][SQL][FOLLOWUP] Refine wrong comment
ulysses-you May 14, 2021
68239d1
[SPARK-35404][CORE] Name the timers in TaskSchedulerImpl
yaooqinn May 14, 2021
94bd480
[SPARK-35206][TESTS][SQL] Extract common used get project path into a…
Ngone51 May 14, 2021
d2fbf0d
[SPARK-35405][DOC] Submitting Applications documentation has outdated…
o-shevchenko May 14, 2021
a8032e7
[SPARK-35384][SQL][FOLLOWUP] Move `HashMap.get` out of `InvokeLike.in…
sunchao May 14, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludov…
…ic.netlib:2.0

### What changes were proposed in this pull request?

Bump to `dev.ludovic.netlib:2.0` which provides JNI-based wrappers for BLAS, ARPACK, and LAPACK. Theseare not taking dependencies on GPL or LGPL libraries, allowing to provide out-of-the-box support for hardware acceleration when a native library is present (this is still up to the end-user to install such library on their system, like OpenBLAS, Intel MKL, and libarpack2).

### Why are the changes needed?

Great performance improvement for ML-related workload on vanilla-distributions of Spark.

### Does this PR introduce _any_ user-facing change?

Users now take advantage of hardware acceleration as long as a native library is installed (like OpenBLAS, Intel MKL and libarpack2).

### How was this patch tested?

Spark test-suite + dev.ludovic.netlib testsuite.

#### JDK8:
```
[info] OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.F2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.Java8BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS
[info]
[info] daxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        220            226           6        454.9           2.2       1.0X
[info] java                       221            228           5        451.9           2.2       1.0X
[info] native                     209            215           5        478.7           2.1       1.1X
[info]
[info] saxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        121            125           3        823.3           1.2       1.0X
[info] java                       121            125           3        824.3           1.2       1.0X
[info] native                     101            105           3        988.4           1.0       1.2X
[info]
[info] dcopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        212            219           6        470.9           2.1       1.0X
[info] java                       208            212           4        481.0           2.1       1.0X
[info] native                     209            215           5        478.5           2.1       1.0X
[info]
[info] scopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        114            119           3        878.9           1.1       1.0X
[info] java                        99            105           3       1011.4           1.0       1.2X
[info] native                      97            103           3       1026.7           1.0       1.2X
[info]
[info] ddot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        108            111           2        925.9           1.1       1.0X
[info] java                        71             73           2       1414.9           0.7       1.5X
[info] native                      54             56           2       1847.0           0.5       2.0X
[info]
[info] sdot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         96             97           2       1046.8           1.0       1.0X
[info] java                        47             48           1       2129.8           0.5       2.0X
[info] native                      29             30           1       3404.7           0.3       3.3X
[info]
[info] dnrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        139            143           2        718.2           1.4       1.0X
[info] java                        46             47           1       2171.2           0.5       3.0X
[info] native                      44             46           2       2261.8           0.4       3.1X
[info]
[info] snrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        154            157           4        651.0           1.5       1.0X
[info] java                        40             42           1       2469.3           0.4       3.8X
[info] native                      26             27           1       3787.6           0.3       5.8X
[info]
[info] dscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        185            195           8        541.0           1.8       1.0X
[info] java                       186            196           7        538.5           1.9       1.0X
[info] native                     177            187           7        564.1           1.8       1.0X
[info]
[info] sscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         98            102           3       1016.2           1.0       1.0X
[info] java                        98            102           3       1017.8           1.0       1.0X
[info] native                      87             91           3       1143.2           0.9       1.1X
[info]
[info] dgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         68             70           1       1474.7           0.7       1.0X
[info] java                        51             52           1       1973.0           0.5       1.3X
[info] native                      30             32           1       3298.8           0.3       2.2X
[info]
[info] dgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         96             99           2       1037.9           1.0       1.0X
[info] java                        50             51           1       1999.6           0.5       1.9X
[info] native                      30             31           1       3368.1           0.3       3.2X
[info]
[info] sgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         59             61           1       1688.7           0.6       1.0X
[info] java                        41             42           1       2461.9           0.4       1.5X
[info] native                      15             16           1       6593.0           0.2       3.9X
[info]
[info] sgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         90             92           1       1116.2           0.9       1.0X
[info] java                        39             40           1       2565.8           0.4       2.3X
[info] native                      15             16           1       6594.2           0.2       5.9X
[info]
[info] dger:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        192            202           7        520.5           1.9       1.0X
[info] java                       203            214           7        491.9           2.0       0.9X
[info] native                     176            187           7        568.8           1.8       1.1X
[info]
[info] dspmv[U]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         59             61           1        846.1           1.2       1.0X
[info] java                        38             39           1       1313.5           0.8       1.6X
[info] native                      24             27           1       2047.8           0.5       2.4X
[info]
[info] dspr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         97            101           3        515.4           1.9       1.0X
[info] java                        97            101           2        515.1           1.9       1.0X
[info] native                      88             91           3        569.1           1.8       1.1X
[info]
[info] dsyr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        169            174           3        295.4           3.4       1.0X
[info] java                       169            174           3        295.4           3.4       1.0X
[info] native                     160            165           4        312.2           3.2       1.1X
[info]
[info] dgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        561            577          13       1782.3           0.6       1.0X
[info] java                       225            231           4       4446.2           0.2       2.5X
[info] native                      31             32           3      32473.1           0.0      18.2X
[info]
[info] dgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        570            584           9       1754.8           0.6       1.0X
[info] java                       224            230           4       4457.3           0.2       2.5X
[info] native                      31             32           1      32493.4           0.0      18.5X
[info]
[info] dgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        855            866           6       1169.2           0.9       1.0X
[info] java                       224            228           3       4466.9           0.2       3.8X
[info] native                      31             32           1      32395.5           0.0      27.7X
[info]
[info] dgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                       1328           1344           8        752.8           1.3       1.0X
[info] java                       224            230           4       4458.9           0.2       5.9X
[info] native                      31             32           1      32201.8           0.0      42.8X
[info]
[info] sgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        534            541           5       1873.0           0.5       1.0X
[info] java                       220            224           3       4542.8           0.2       2.4X
[info] native                      15             16           1      66803.1           0.0      35.7X
[info]
[info] sgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        544            551           6       1839.6           0.5       1.0X
[info] java                       220            224           4       4538.2           0.2       2.5X
[info] native                      15             16           1      65589.9           0.0      35.7X
[info]
[info] sgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        833            845          21       1201.0           0.8       1.0X
[info] java                       220            224           3       4548.7           0.2       3.8X
[info] native                      15             16           1      66603.2           0.0      55.5X
[info]
[info] sgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        899            907           5       1112.9           0.9       1.0X
[info] java                       221            224           2       4531.6           0.2       4.1X
[info] native                      15             16           1      65944.9           0.0      59.3X
```

#### JDK11:
```
[info] OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.F2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.Java11BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS
[info]
[info] daxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        195            200           3        512.2           2.0       1.0X
[info] java                       197            202           3        507.0           2.0       1.0X
[info] native                     184            189           4        543.0           1.8       1.1X
[info]
[info] saxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        108            112           3        921.8           1.1       1.0X
[info] java                       101            105           3        989.4           1.0       1.1X
[info] native                      87             91           3       1147.1           0.9       1.2X
[info]
[info] dcopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        187            191           3        535.1           1.9       1.0X
[info] java                       182            188           3        548.8           1.8       1.0X
[info] native                     178            182           3        562.2           1.8       1.1X
[info]
[info] scopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        110            114           3        909.3           1.1       1.0X
[info] java                        86             93           4       1159.3           0.9       1.3X
[info] native                      86             90           3       1162.4           0.9       1.3X
[info]
[info] ddot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        106            108           2        943.6           1.1       1.0X
[info] java                        70             71           2       1426.8           0.7       1.5X
[info] native                      54             56           2       1835.4           0.5       1.9X
[info]
[info] sdot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         96             97           1       1047.1           1.0       1.0X
[info] java                        43             44           1       2331.9           0.4       2.2X
[info] native                      29             30           1       3392.1           0.3       3.2X
[info]
[info] dnrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        114            115           2        880.7           1.1       1.0X
[info] java                        42             43           1       2398.1           0.4       2.7X
[info] native                      45             46           1       2233.3           0.4       2.5X
[info]
[info] snrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        140            143           2        714.6           1.4       1.0X
[info] java                        28             29           1       3531.0           0.3       4.9X
[info] native                      26             27           1       3820.0           0.3       5.3X
[info]
[info] dscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        156            166           7        641.3           1.6       1.0X
[info] java                       158            167           6        633.2           1.6       1.0X
[info] native                     150            160           7        664.8           1.5       1.0X
[info]
[info] sscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         85             88           2       1181.7           0.8       1.0X
[info] java                        85             88           2       1176.0           0.9       1.0X
[info] native                      75             78           2       1333.2           0.8       1.1X
[info]
[info] dgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         58             59           1       1731.1           0.6       1.0X
[info] java                        41             43           1       2415.5           0.4       1.4X
[info] native                      30             31           1       3293.9           0.3       1.9X
[info]
[info] dgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         94             96           1       1063.4           0.9       1.0X
[info] java                        41             42           1       2435.8           0.4       2.3X
[info] native                      30             30           1       3379.8           0.3       3.2X
[info]
[info] sgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         44             45           1       2278.9           0.4       1.0X
[info] java                        37             38           0       2686.8           0.4       1.2X
[info] native                      15             16           1       6555.4           0.2       2.9X
[info]
[info] sgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         88             89           1       1142.1           0.9       1.0X
[info] java                        33             34           1       3010.7           0.3       2.6X
[info] native                      15             16           1       6553.9           0.2       5.7X
[info]
[info] dger:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        164            172           4        609.4           1.6       1.0X
[info] java                       163            172           5        612.6           1.6       1.0X
[info] native                     150            159           4        667.0           1.5       1.1X
[info]
[info] dspmv[U]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         49             50           1       1029.4           1.0       1.0X
[info] java                        41             42           1       1209.4           0.8       1.2X
[info] native                      25             27           1       2029.2           0.5       2.0X
[info]
[info] dspr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         80             85           3        622.2           1.6       1.0X
[info] java                        80             85           3        622.4           1.6       1.0X
[info] native                      75             79           3        668.7           1.5       1.1X
[info]
[info] dsyr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        137            142           3        364.1           2.7       1.0X
[info] java                       139            142           2        360.4           2.8       1.0X
[info] native                     131            135           3        380.4           2.6       1.0X
[info]
[info] dgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        517            525           5       1935.5           0.5       1.0X
[info] java                       213            216           3       4704.8           0.2       2.4X
[info] native                      31             31           1      32705.6           0.0      16.9X
[info]
[info] dgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        589            601           6       1698.6           0.6       1.0X
[info] java                       213            217           3       4693.3           0.2       2.8X
[info] native                      31             32           1      32498.9           0.0      19.1X
[info]
[info] dgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        851            865           6       1175.3           0.9       1.0X
[info] java                       212            216           3       4717.0           0.2       4.0X
[info] native                      30             32           1      32903.0           0.0      28.0X
[info]
[info] dgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                       1301           1316           6        768.4           1.3       1.0X
[info] java                       212            216           2       4717.4           0.2       6.1X
[info] native                      31             32           1      32606.0           0.0      42.4X
[info]
[info] sgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        454            460           2       2203.0           0.5       1.0X
[info] java                       208            212           3       4803.8           0.2       2.2X
[info] native                      15             16           0      66586.0           0.0      30.2X
[info]
[info] sgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        529            536           4       1889.7           0.5       1.0X
[info] java                       208            212           3       4798.6           0.2       2.5X
[info] native                      15             16           1      66751.4           0.0      35.3X
[info]
[info] sgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        830            840           5       1205.1           0.8       1.0X
[info] java                       208            211           2       4814.1           0.2       4.0X
[info] native                      15             15           1      67676.4           0.0      56.2X
[info]
[info] sgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        894            907           7       1118.7           0.9       1.0X
[info] java                       208            211           3       4809.6           0.2       4.3X
[info] native                      15             16           1      66675.2           0.0      59.6X
```

#### JDK16:
```
[info] OpenJDK 64-Bit Server VM 16+36 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.F2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.VectorBLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS
[info]
[info] daxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        193            199           3        517.5           1.9       1.0X
[info] java                       181            186           4        553.2           1.8       1.1X
[info] native                     181            185           5        553.6           1.8       1.1X
[info]
[info] saxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        108            112           2        925.1           1.1       1.0X
[info] java                        88             91           3       1138.6           0.9       1.2X
[info] native                      87             91           3       1144.2           0.9       1.2X
[info]
[info] dcopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        184            189           3        542.5           1.8       1.0X
[info] java                       181            185           3        552.8           1.8       1.0X
[info] native                     179            183           2        558.0           1.8       1.0X
[info]
[info] scopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         97            101           3       1031.6           1.0       1.0X
[info] java                        86             90           2       1163.7           0.9       1.1X
[info] native                      85             88           2       1182.9           0.8       1.1X
[info]
[info] ddot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        107            109           2        932.4           1.1       1.0X
[info] java                        54             56           2       1846.7           0.5       2.0X
[info] native                      54             56           2       1846.7           0.5       2.0X
[info]
[info] sdot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         96             97           1       1043.6           1.0       1.0X
[info] java                        29             30           1       3439.3           0.3       3.3X
[info] native                      29             30           1       3423.9           0.3       3.3X
[info]
[info] dnrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        121            123           2        829.8           1.2       1.0X
[info] java                        32             32           1       3171.3           0.3       3.8X
[info] native                      45             46           1       2246.2           0.4       2.7X
[info]
[info] snrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        142            144           2        705.9           1.4       1.0X
[info] java                        15             16           1       6585.8           0.2       9.3X
[info] native                      26             27           1       3839.5           0.3       5.4X
[info]
[info] dscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        157            165           5        635.6           1.6       1.0X
[info] java                       151            159           5        664.0           1.5       1.0X
[info] native                     151            160           5        663.6           1.5       1.0X
[info]
[info] sscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         85             89           2       1172.3           0.9       1.0X
[info] java                        75             79           3       1337.3           0.7       1.1X
[info] native                      75             79           2       1335.5           0.7       1.1X
[info]
[info] dgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         58             59           1       1731.5           0.6       1.0X
[info] java                        28             29           1       3544.2           0.3       2.0X
[info] native                      30             31           1       3306.2           0.3       1.9X
[info]
[info] dgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         90             92           1       1108.3           0.9       1.0X
[info] java                        28             28           1       3622.5           0.3       3.3X
[info] native                      30             31           1       3381.3           0.3       3.1X
[info]
[info] sgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         44             45           1       2284.7           0.4       1.0X
[info] java                        14             15           1       7034.0           0.1       3.1X
[info] native                      15             16           1       6643.7           0.2       2.9X
[info]
[info] sgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         85             86           1       1177.4           0.8       1.0X
[info] java                        15             15           1       6886.1           0.1       5.8X
[info] native                      15             16           1       6560.1           0.2       5.6X
[info]
[info] dger:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        164            173           6        608.1           1.6       1.0X
[info] java                       148            157           5        675.2           1.5       1.1X
[info] native                     152            160           5        659.9           1.5       1.1X
[info]
[info] dspmv[U]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         61             63           1        815.4           1.2       1.0X
[info] java                        16             17           1       3104.3           0.3       3.8X
[info] native                      24             27           1       2071.9           0.5       2.5X
[info]
[info] dspr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         81             85           2        616.4           1.6       1.0X
[info] java                        81             85           2        614.7           1.6       1.0X
[info] native                      75             78           2        669.5           1.5       1.1X
[info]
[info] dsyr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        138            141           3        362.7           2.8       1.0X
[info] java                       137            140           2        365.3           2.7       1.0X
[info] native                     131            134           2        382.9           2.6       1.1X
[info]
[info] dgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        525            544           8       1906.2           0.5       1.0X
[info] java                        61             68           3      16358.1           0.1       8.6X
[info] native                      31             32           1      32623.7           0.0      17.1X
[info]
[info] dgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        580            598          12       1724.5           0.6       1.0X
[info] java                        61             68           4      16302.5           0.1       9.5X
[info] native                      30             32           1      32962.8           0.0      19.1X
[info]
[info] dgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        829            838           4       1206.2           0.8       1.0X
[info] java                        61             69           3      16339.7           0.1      13.5X
[info] native                      30             31           1      33231.9           0.0      27.6X
[info]
[info] dgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                       1352           1363           5        739.6           1.4       1.0X
[info] java                        61             69           3      16347.0           0.1      22.1X
[info] native                      31             32           1      32740.3           0.0      44.3X
[info]
[info] sgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        482            493           7       2073.1           0.5       1.0X
[info] java                        35             38           2      28315.3           0.0      13.7X
[info] native                      15             15           1      67579.7           0.0      32.6X
[info]
[info] sgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        472            482           4       2119.0           0.5       1.0X
[info] java                        36             38           2      28138.1           0.0      13.3X
[info] native                      15             16           1      66616.5           0.0      31.4X
[info]
[info] sgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        823            830           5       1215.2           0.8       1.0X
[info] java                        35             38           2      28681.4           0.0      23.6X
[info] native                      15             15           1      67908.4           0.0      55.9X
[info]
[info] sgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        896            908           7       1115.8           0.9       1.0X
[info] java                        35             38           2      28402.0           0.0      25.5X
[info] native                      15             16           0      66691.2           0.0      59.8X
```

TODO:
- [x] update documentation in `docs/` and `docs/ml-linalg-guide.md` refering `com.github.fommil.netlib`
- [ ] merge luhenry/netlib#1 with all feedback from this PR + remove references to snapshot repositories in `pom.xml` and `project/SparkBuild.scala`.

Closes apache#32415 from luhenry/master.

Authored-by: Ludovic Henry <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
  • Loading branch information
luhenry authored and srowen committed May 12, 2021
commit b52d47a920c24826b45d9d8c0fb4e59fc119479b
4 changes: 3 additions & 1 deletion LICENSE-binary
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,6 @@ org.antlr:ST4
org.antlr:stringtemplate
org.antlr:antlr4-runtime
antlr:antlr
com.github.fommil.netlib:core
com.thoughtworks.paranamer:paranamer
org.scala-lang:scala-compiler
org.scala-lang:scala-library
Expand Down Expand Up @@ -485,6 +484,9 @@ org.slf4j:jul-to-slf4j
org.slf4j:slf4j-api
org.slf4j:slf4j-log4j12
com.github.scopt:scopt_2.12
dev.ludovic.netlib:blas
dev.ludovic.netlib:arpack
dev.ludovic.netlib:lapack

core/src/main/resources/org/apache/spark/ui/static/dagre-d3.min.js
core/src/main/resources/org/apache/spark/ui/static/*dataTables*
Expand Down
6 changes: 3 additions & 3 deletions dev/deps/spark-deps-hadoop-2.7-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
api-util/1.0.0-M20//api-util-1.0.0-M20.jar
arpack/1.3.2//arpack-1.3.2.jar
arpack/2.2.0//arpack-2.2.0.jar
arpack_combined_all/0.1//arpack_combined_all-0.1.jar
arrow-format/2.0.0//arrow-format-2.0.0.jar
arrow-memory-core/2.0.0//arrow-memory-core-2.0.0.jar
Expand All @@ -26,7 +26,7 @@ automaton/1.11-8//automaton-1.11-8.jar
avro-ipc/1.10.2//avro-ipc-1.10.2.jar
avro-mapred/1.10.2//avro-mapred-1.10.2.jar
avro/1.10.2//avro-1.10.2.jar
blas/1.3.2//blas-1.3.2.jar
blas/2.2.0//blas-2.2.0.jar
bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
breeze-macros_2.12/1.0//breeze-macros_2.12-1.0.jar
breeze_2.12/1.0//breeze_2.12-1.0.jar
Expand Down Expand Up @@ -174,7 +174,7 @@ kubernetes-model-policy/5.3.1//kubernetes-model-policy-5.3.1.jar
kubernetes-model-rbac/5.3.1//kubernetes-model-rbac-5.3.1.jar
kubernetes-model-scheduling/5.3.1//kubernetes-model-scheduling-5.3.1.jar
kubernetes-model-storageclass/5.3.1//kubernetes-model-storageclass-5.3.1.jar
lapack/1.3.2//lapack-1.3.2.jar
lapack/2.2.0//lapack-2.2.0.jar
leveldbjni-all/1.8//leveldbjni-all-1.8.jar
libfb303/0.9.3//libfb303-0.9.3.jar
libthrift/0.12.0//libthrift-0.12.0.jar
Expand Down
6 changes: 3 additions & 3 deletions dev/deps/spark-deps-hadoop-3.2-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ annotations/17.0.0//annotations-17.0.0.jar
antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
antlr4-runtime/4.8-1//antlr4-runtime-4.8-1.jar
aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
arpack/1.3.2//arpack-1.3.2.jar
arpack/2.2.0//arpack-2.2.0.jar
arpack_combined_all/0.1//arpack_combined_all-0.1.jar
arrow-format/2.0.0//arrow-format-2.0.0.jar
arrow-memory-core/2.0.0//arrow-memory-core-2.0.0.jar
Expand All @@ -21,7 +21,7 @@ automaton/1.11-8//automaton-1.11-8.jar
avro-ipc/1.10.2//avro-ipc-1.10.2.jar
avro-mapred/1.10.2//avro-mapred-1.10.2.jar
avro/1.10.2//avro-1.10.2.jar
blas/1.3.2//blas-1.3.2.jar
blas/2.2.0//blas-2.2.0.jar
bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
breeze-macros_2.12/1.0//breeze-macros_2.12-1.0.jar
breeze_2.12/1.0//breeze_2.12-1.0.jar
Expand Down Expand Up @@ -145,7 +145,7 @@ kubernetes-model-policy/5.3.1//kubernetes-model-policy-5.3.1.jar
kubernetes-model-rbac/5.3.1//kubernetes-model-rbac-5.3.1.jar
kubernetes-model-scheduling/5.3.1//kubernetes-model-scheduling-5.3.1.jar
kubernetes-model-storageclass/5.3.1//kubernetes-model-storageclass-5.3.1.jar
lapack/1.3.2//lapack-1.3.2.jar
lapack/2.2.0//lapack-2.2.0.jar
leveldbjni-all/1.8//leveldbjni-all-1.8.jar
libfb303/0.9.3//libfb303-0.9.3.jar
libthrift/0.12.0//libthrift-0.12.0.jar
Expand Down
7 changes: 3 additions & 4 deletions docs/ml-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,11 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin

# Dependencies

MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths.
MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/), [dev.ludovic.netlib](https://github.com/luhenry/netlib), and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths.

Due to differing OSS licenses, `netlib-java`'s native proxies can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead:
However, native acceleration libraries can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead:
```
WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS
WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS
WARN BLAS: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
```

To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer.
Expand Down
36 changes: 13 additions & 23 deletions docs/ml-linalg-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,29 +21,15 @@ license: |

This guide provides necessary information to enable accelerated linear algebra processing for Spark MLlib.

Spark MLlib defines Vector and Matrix as basic data types for machine learning algorithms. On top of them, [BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and [LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and supported by [netlib-java](https://github.com/fommil/netlib-Java) (the algorithms may call [Breeze](https://github.com/scalanlp/breeze) and it will in turn call `netlib-java`). `netlib-java` can use optimized native linear algebra libraries (refered to as "native libraries" or "BLAS libraries" hereafter) for faster numerical processing. [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) and [OpenBLAS](http://www.openblas.net) are two popular ones.
Spark MLlib defines Vector and Matrix as basic data types for machine learning algorithms. On top of them, [BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and [LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and supported by [dev.ludovic.netlib](https://github.com/luhenry/netlib) (the algorithms may also call [Breeze](https://github.com/scalanlp/breeze)). `dev.ludovic.netlib` can use optimized native linear algebra libraries (refered to as "native libraries" or "BLAS libraries" hereafter) for faster numerical processing. [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) and [OpenBLAS](http://www.openblas.net) are two popular ones.

However due to license differences, the official released Spark binaries by default don't contain native libraries support for `netlib-java`.
The official released Spark binaries don't contain these native libraries.

The following sections describe how to enable `netlib-java` with native libraries support for Spark MLlib and how to install native libraries and configure them properly.

## Enable `netlib-java` with native library proxies

`netlib-java` depends on `libgfortran`. It requires GFORTRAN 1.4 or above. This can be obtained by installing `libgfortran` package. After installation, the following command can be used to verify if it is installed properly.
```
strings /path/to/libgfortran.so.3.0.0 | grep GFORTRAN_1.4
```

To build Spark with `netlib-java` native library proxies, you need to add `-Pnetlib-lgpl` to Maven build command line. For example:
```
$SPARK_SOURCE_HOME/build/mvn -Pnetlib-lgpl -DskipTests -Pyarn -Phadoop-2.7 clean package
```

If you only want to enable it in your project, include `com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
The following sections describe how to install native libraries, configure them properly, and how to point `dev.ludovic.netlib` to these native libraries.

## Install native linear algebra libraries

Intel MKL and OpenBLAS are two popular native linear algebra libraries. You can choose one of them based on your preference. We provide basic instructions as below. You can refer to [netlib-java documentation](https://github.com/fommil/netlib-java) for more advanced installation instructions.
Intel MKL and OpenBLAS are two popular native linear algebra libraries. You can choose one of them based on your preference. We provide basic instructions as below.

### Intel MKL

Expand Down Expand Up @@ -72,16 +58,20 @@ sudo yum install openblas

To verify native libraries are properly loaded, start `spark-shell` and run the following code:
```
scala> import com.github.fommil.netlib.BLAS;
scala> System.out.println(BLAS.getInstance().getClass().getName());
scala> import dev.ludovic.netlib.NativeBLAS
scala> NativeBLAS.getInstance()
```

If they are correctly loaded, it should print `com.github.fommil.netlib.NativeSystemBLAS`. Otherwise the warnings should be printed:
If they are correctly loaded, it should print `dev.ludovic.netlib.NativeBLAS = dev.ludovic.netlib.blas.JNIBLAS@...`. Otherwise the warnings should be printed:
```
WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeSystemBLAS
WARN BLAS: Failed to load implementation from:com.github.fommil.netlib.NativeRefBLAS
WARN NativeBLAS: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
java.lang.RuntimeException: Unable to load native implementation
at dev.ludovic.netlib.NativeBLAS.getInstance(NativeBLAS.java:44)
...
```

You can also point `dev.ludovic.netlib` to specific libraries names and paths. For example, `-Ddev.ludovic.netlib.blas.nativeLib=libmkl_rt.so` or `-Ddev.ludovic.netlib.blas.nativeLibPath=$MKLROOT/lib/intel64/libmkl_rt.so` for Intel MKL. You have similar parameters for LAPACK and ARPACK: `-Ddev.ludovic.netlib.lapack.nativeLib=...`, `-Ddev.ludovic.netlib.lapack.nativeLibPath=...`, `-Ddev.ludovic.netlib.arpack.nativeLib=...`, and `-Ddev.ludovic.netlib.arpack.nativeLibPath=...`.

If native libraries are not properly configured in the system, the Java implementation (javaBLAS) will be used as fallback option.

## Spark Configuration
Expand Down
13 changes: 0 additions & 13 deletions mllib-local/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -81,19 +81,6 @@
<artifactId>blas</artifactId>
</dependency>
</dependencies>
<profiles>
<profile>
<id>netlib-lgpl</id>
<dependencies>
<dependency>
<groupId>com.github.fommil.netlib</groupId>
<artifactId>all</artifactId>
<version>${netlib.java.version}</version>
<type>pom</type>
</dependency>
</dependencies>
</profile>
</profiles>
<build>
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
Expand Down
Loading