Commit 2a24c48
[SPARK-23975][ML] Allow Clustering to take Arrays of Double as input features
## What changes were proposed in this pull request?
- Multiple possible input types is added in validateAndTransformSchema() and computeCost() while checking column type
- Add if statement in transform() to support array type as featuresCol
- Add the case statement in fit() while selecting columns from dataset
These changes will be applied to KMeans first, then to other clustering method
## How was this patch tested?
unit test is added
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Lu WANG <[email protected]>
Closes #21081 from ludatabricks/SPARK-23975.1 parent 55c4ca8 commit 2a24c48
File tree
3 files changed
+126
-7
lines changed- mllib/src
- main/scala/org/apache/spark/ml
- clustering
- util
- test/scala/org/apache/spark/ml/clustering
3 files changed
+126
-7
lines changedLines changed: 25 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
37 | | - | |
| 36 | + | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
89 | 100 | | |
90 | 101 | | |
91 | 102 | | |
92 | 103 | | |
93 | 104 | | |
94 | 105 | | |
95 | | - | |
| 106 | + | |
96 | 107 | | |
97 | 108 | | |
98 | 109 | | |
| |||
125 | 136 | | |
126 | 137 | | |
127 | 138 | | |
| 139 | + | |
128 | 140 | | |
129 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
130 | 144 | | |
131 | 145 | | |
132 | 146 | | |
| |||
146 | 160 | | |
147 | 161 | | |
148 | 162 | | |
149 | | - | |
150 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
151 | 167 | | |
152 | 168 | | |
153 | 169 | | |
| |||
335 | 351 | | |
336 | 352 | | |
337 | 353 | | |
338 | | - | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
339 | 357 | | |
340 | 358 | | |
341 | 359 | | |
| |||
Lines changed: 63 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
Lines changed: 38 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
| 34 | + | |
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| |||
199 | 201 | | |
200 | 202 | | |
201 | 203 | | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
202 | 240 | | |
203 | 241 | | |
204 | 242 | | |
| |||
0 commit comments