Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Update #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Update #11
Changes from 1 commit
300887b3c07b8f65083e9ef29a9af8811a5c8850a36e03de3deefd9df820b567f37188a8786602ddb141daaca142ef016bfaeb41d84324fb4b736db36ddeb76e3c5a2aa43a8d0402be9c3afd32a5ef581bd86118d7d54a4723a86b23f5bdfb9e1c2e484fecbce0333fca26a21a0fa1ba4b0c1ed3221830825709ae42107277e845cabd58170c56a035c265cc156cf33c258db9ade72c4bbd8f5bf5f757e0cbdb015930f64a0300eae47c387f8050254b4b50c0c7b66bf76b96863ca3af37482cead42b28303a4e47fe08b4dba1405861223e60969b0cbddac240eb8b67d8e15284468b27850e0ccb6bd8345ce3275168c6c64c6b9b5c92d47e7690ed9ac2bb1dbb9da5cec11160f3ceb55ce7dae3a81a1c566c79169e858c6b7f2f742389b1ef7c46436b0956c6e0c2acedc3b58fbf72bb54c6ab90d72ecbfebfd880f3177e34f38f010bc86d2e29514a377afbb460467f22fa8423baea397d3aa67e9876165cec98327df65f5ac2dd75579deacc78822fc4e73bf7cee0df02ca1c938419b7bbcef9adda973c8ea8c3002c404d462f377b0689ccc53c73fedf50eb4a7fe216ffa15cacc8abf2918b8e68861c53a5d98e94192c2e7a46aa0fc9ad5f1f3d39f2e984d79ee02ec0588cd6eea90a6a46b97070e28fdc6ff1069b865b987ca81918cce95bd8b5d17ef9b2a3c629372b6a6d7b61d5834f0b384119dd1c9cb4a90276cb0e9b0b660de7050616b6cf5076d24d5bfb043c27a68d442f0afb629ce2bf3723be600fe54cf89f9122f515f94a51118afef27b2d24076069cd53ebf1a6aa8838ad71b2ab1c9afcbe49bdf5da7eba0fbc251fd74d95526b5fb141f5f2d27346bc17e7f4d25288ce58561d31d5af53adc86e9bc5d7fe1784376d3120a350ceb62815b99bf2052e65853ed7f1e464f0a915f8ee48223d849fe879047ff57317e11495290bf938dc14c0622244316a7b0fcd24c048eccaaea7a99a217ec52a4d38906dc1b15e7a6dc97eb6d76384f421d238f230a86acFile filter
Filter by extension
Conversations
Uh oh!
There was an error while loading. Please reload this page.
Jump to
Uh oh!
There was an error while loading. Please reload this page.
…rmation performance The following optimizations are done to improve the StandardScaler model transformation performance. 1) Covert Breeze dense vector to primitive vector to reduce the overhead. 2) Since mean can be potentially a sparse vector, we explicitly convert it to dense primitive vector. 3) Have a local reference to `shift` and `factor` array so JVM can locate the value with one operation call. 4) In pattern matching part, we use the mllib SparseVector/DenseVector instead of breeze's vector to make the codebase cleaner. Benchmark with mnist8m dataset: Before, DenseVector withMean and withStd: 50.97secs DenseVector withMean and withoutStd: 42.11secs DenseVector withoutMean and withStd: 8.75secs SparseVector withoutMean and withStd: 5.437secs With this PR, DenseVector withMean and withStd: 5.76secs DenseVector withMean and withoutStd: 5.28secs DenseVector withoutMean and withStd: 5.30secs SparseVector withoutMean and withStd: 1.27secs Note that without the local reference copy of `factor` and `shift` arrays, the runtime is almost three time slower. DenseVector withMean and withStd: 18.15secs DenseVector withMean and withoutStd: 18.05secs DenseVector withoutMean and withStd: 18.54secs SparseVector withoutMean and withStd: 2.01secs The following code, ```scala while (i < size) { values(i) = (values(i) - shift(i)) * factor(i) i += 1 } ``` will generate the bytecode ``` L13 LINENUMBER 106 L13 FRAME FULL [org/apache/spark/mllib/feature/StandardScalerModel org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/DenseVector T [D I I] [] ILOAD 7 ILOAD 6 IF_ICMPGE L14 L15 LINENUMBER 107 L15 ALOAD 5 ILOAD 7 ALOAD 5 ILOAD 7 DALOAD ALOAD 0 INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.shift ()[D ILOAD 7 DALOAD DSUB ALOAD 0 INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.factor ()[D ILOAD 7 DALOAD DMUL DASTORE L16 LINENUMBER 108 L16 ILOAD 7 ICONST_1 IADD ISTORE 7 GOTO L13 ``` , while with the local reference of the `shift` and `factor` arrays, the bytecode will be ``` L14 LINENUMBER 107 L14 ALOAD 0 INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.factor ()[D ASTORE 9 L15 LINENUMBER 108 L15 FRAME FULL [org/apache/spark/mllib/feature/StandardScalerModel org/apache/spark/mllib/linalg/Vector [D org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/DenseVector T [D I I [D] [] ILOAD 8 ILOAD 7 IF_ICMPGE L16 L17 LINENUMBER 109 L17 ALOAD 6 ILOAD 8 ALOAD 6 ILOAD 8 DALOAD ALOAD 2 ILOAD 8 DALOAD DSUB ALOAD 9 ILOAD 8 DALOAD DMUL DASTORE L18 LINENUMBER 110 L18 ILOAD 8 ICONST_1 IADD ISTORE 8 GOTO L15 ``` You can see that with local reference, the both of the arrays will be in the stack, so JVM can access the value without calling `INVOKESPECIAL`. Author: DB Tsai <[email protected]> Closes apache#3435 from dbtsai/standardscaler and squashes the following commits: 85885a9 [DB Tsai] revert to have lazy in shift array. daf2b06 [DB Tsai] Address the feedback cdb5cef [DB Tsai] small change 9c51eef [DB Tsai] style fc795e4 [DB Tsai] update 5bffd3d [DB Tsai] first commitUh oh!
There was an error while loading. Please reload this page.
There are no files selected for viewing