Fix installing twine and add a step to syncing KEYS #18

HyukjinKwon · 2021-03-03T03:58:58Z

No description provided.

### What changes were proposed in this pull request? Currently, Spark DS V2 aggregate push-down doesn't supports project with alias. Refer https://github.com/apache/spark/blob/c91c2e9afec0d5d5bbbd2e155057fe409c5bb928/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L96 This PR let it works good with alias. **The first example:** the origin plan show below: ``` Aggregate [DEPT#0], [DEPT#0, sum(mySalary#8) AS total#14] +- Project [DEPT#0, SALARY#2 AS mySalary#8] +- ScanBuilderHolder [DEPT#0, NAME#1, SALARY#2, BONUS#3], RelationV2[DEPT#0, NAME#1, SALARY#2, BONUS#3] test.employee, JDBCScanBuilder(org.apache.spark.sql.test.TestSparkSession77978658,StructType(StructField(DEPT,IntegerType,true),StructField(NAME,StringType,true),StructField(SALARY,DecimalType(20,2),true),StructField(BONUS,DoubleType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions5f8da82) ``` If we can complete push down the aggregate, then the plan will be: ``` Project [DEPT#0, SUM(SALARY)#18 AS sum(SALARY#2)#13 AS total#14] +- RelationV2[DEPT#0, SUM(SALARY)#18] test.employee ``` If we can partial push down the aggregate, then the plan will be: ``` Aggregate [DEPT#0], [DEPT#0, sum(cast(SUM(SALARY)#18 as decimal(20,2))) AS total#14] +- RelationV2[DEPT#0, SUM(SALARY)#18] test.employee ``` **The second example:** the origin plan show below: ``` Aggregate [myDept#33], [myDept#33, sum(mySalary#34) AS total#40] +- Project [DEPT#25 AS myDept#33, SALARY#27 AS mySalary#34] +- ScanBuilderHolder [DEPT#25, NAME#26, SALARY#27, BONUS#28], RelationV2[DEPT#25, NAME#26, SALARY#27, BONUS#28] test.employee, JDBCScanBuilder(org.apache.spark.sql.test.TestSparkSession25c4f621,StructType(StructField(DEPT,IntegerType,true),StructField(NAME,StringType,true),StructField(SALARY,DecimalType(20,2),true),StructField(BONUS,DoubleType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions345d641e) ``` If we can complete push down the aggregate, then the plan will be: ``` Project [DEPT#25 AS myDept#33, SUM(SALARY)apache#44 AS sum(SALARY#27)apache#39 AS total#40] +- RelationV2[DEPT#25, SUM(SALARY)apache#44] test.employee ``` If we can partial push down the aggregate, then the plan will be: ``` Aggregate [myDept#33], [DEPT#25 AS myDept#33, sum(cast(SUM(SALARY)apache#56 as decimal(20,2))) AS total#52] +- RelationV2[DEPT#25, SUM(SALARY)apache#56] test.employee ``` ### Why are the changes needed? Alias is more useful. ### Does this PR introduce _any_ user-facing change? 'Yes'. Users could see DS V2 aggregate push-down supports project with alias. ### How was this patch tested? New tests. Closes apache#35932 from beliefer/SPARK-38533_new. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? Currently, Spark DS V2 aggregate push-down doesn't supports project with alias. Refer https://github.com/apache/spark/blob/c91c2e9afec0d5d5bbbd2e155057fe409c5bb928/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L96 This PR let it works good with alias. **The first example:** the origin plan show below: ``` Aggregate [DEPT#0], [DEPT#0, sum(mySalary#8) AS total#14] +- Project [DEPT#0, SALARY#2 AS mySalary#8] +- ScanBuilderHolder [DEPT#0, NAME#1, SALARY#2, BONUS#3], RelationV2[DEPT#0, NAME#1, SALARY#2, BONUS#3] test.employee, JDBCScanBuilder(org.apache.spark.sql.test.TestSparkSession77978658,StructType(StructField(DEPT,IntegerType,true),StructField(NAME,StringType,true),StructField(SALARY,DecimalType(20,2),true),StructField(BONUS,DoubleType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions5f8da82) ``` If we can complete push down the aggregate, then the plan will be: ``` Project [DEPT#0, SUM(SALARY)#18 AS sum(SALARY#2)#13 AS total#14] +- RelationV2[DEPT#0, SUM(SALARY)#18] test.employee ``` If we can partial push down the aggregate, then the plan will be: ``` Aggregate [DEPT#0], [DEPT#0, sum(cast(SUM(SALARY)#18 as decimal(20,2))) AS total#14] +- RelationV2[DEPT#0, SUM(SALARY)#18] test.employee ``` **The second example:** the origin plan show below: ``` Aggregate [myDept#33], [myDept#33, sum(mySalary#34) AS total#40] +- Project [DEPT#25 AS myDept#33, SALARY#27 AS mySalary#34] +- ScanBuilderHolder [DEPT#25, NAME#26, SALARY#27, BONUS#28], RelationV2[DEPT#25, NAME#26, SALARY#27, BONUS#28] test.employee, JDBCScanBuilder(org.apache.spark.sql.test.TestSparkSession25c4f621,StructType(StructField(DEPT,IntegerType,true),StructField(NAME,StringType,true),StructField(SALARY,DecimalType(20,2),true),StructField(BONUS,DoubleType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions345d641e) ``` If we can complete push down the aggregate, then the plan will be: ``` Project [DEPT#25 AS myDept#33, SUM(SALARY)apache#44 AS sum(SALARY#27)apache#39 AS total#40] +- RelationV2[DEPT#25, SUM(SALARY)apache#44] test.employee ``` If we can partial push down the aggregate, then the plan will be: ``` Aggregate [myDept#33], [DEPT#25 AS myDept#33, sum(cast(SUM(SALARY)apache#56 as decimal(20,2))) AS total#52] +- RelationV2[DEPT#25, SUM(SALARY)apache#56] test.employee ``` ### Why are the changes needed? Alias is more useful. ### Does this PR introduce _any_ user-facing change? 'Yes'. Users could see DS V2 aggregate push-down supports project with alias. ### How was this patch tested? New tests. Closes apache#35932 from beliefer/SPARK-38533_new. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit f327dad) Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? Restore `scipy` installation in dockerfile ### Why are the changes needed? https://docs.scipy.org/doc/scipy-1.13.1/building/index.html#system-level-dependencies > If you want to use the system Python and pip, you will need: C, C++, and Fortran compilers (typically gcc, g++, and gfortran). ... `scipy` actually depends on `gfortran`, but `apt-get remove --purge -y 'gfortran-11'` broke this dependency. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually check with the first commit apache@5be0dfa: move `apt-get remove --purge -y 'gfortran-11'` ahead of `scipy` installation, then the installation fails with ``` #18 394.3 Collecting scipy #18 394.4 Downloading scipy-1.13.1.tar.gz (57.2 MB) #18 395.2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.2/57.2 MB 76.7 MB/s eta 0:00:00 #18 401.3 Installing build dependencies: started #18 410.5 Installing build dependencies: finished with status 'done' #18 410.5 Getting requirements to build wheel: started #18 410.7 Getting requirements to build wheel: finished with status 'done' #18 410.7 Installing backend dependencies: started #18 411.8 Installing backend dependencies: finished with status 'done' #18 411.8 Preparing metadata (pyproject.toml): started #18 414.9 Preparing metadata (pyproject.toml): finished with status 'error' #18 414.9 error: subprocess-exited-with-error #18 414.9 #18 414.9 × Preparing metadata (pyproject.toml) did not run successfully. #18 414.9 │ exit code: 1 #18 414.9 ╰─> [42 lines of output] #18 414.9 + meson setup /tmp/pip-install-y77ar9d0/scipy_1e543e0816ed4b26984415533ae9079d /tmp/pip-install-y77ar9d0/scipy_1e543e0816ed4b26984415533ae9079d/.mesonpy-xqfvs4ek -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/tmp/pip-install-y77ar9d0/scipy_1e543e0816ed4b26984415533ae9079d/.mesonpy-xqfvs4ek/meson-python-native-file.ini #18 414.9 The Meson build system #18 414.9 Version: 1.5.2 #18 414.9 Source dir: /tmp/pip-install-y77ar9d0/scipy_1e543e0816ed4b26984415533ae9079d #18 414.9 Build dir: /tmp/pip-install-y77ar9d0/scipy_1e543e0816ed4b26984415533ae9079d/.mesonpy-xqfvs4ek #18 414.9 Build type: native build #18 414.9 Project name: scipy #18 414.9 Project version: 1.13.1 #18 414.9 C compiler for the host machine: cc (gcc 11.4.0 "cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0") #18 414.9 C linker for the host machine: cc ld.bfd 2.38 #18 414.9 C++ compiler for the host machine: c++ (gcc 11.4.0 "c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0") #18 414.9 C++ linker for the host machine: c++ ld.bfd 2.38 #18 414.9 Cython compiler for the host machine: cython (cython 3.0.11) #18 414.9 Host machine cpu family: x86_64 #18 414.9 Host machine cpu: x86_64 #18 414.9 Program python found: YES (/usr/local/bin/pypy3) #18 414.9 Run-time dependency python found: YES 3.9 #18 414.9 Program cython found: YES (/tmp/pip-build-env-v_vnvt3h/overlay/bin/cython) #18 414.9 Compiler for C supports arguments -Wno-unused-but-set-variable: YES #18 414.9 Compiler for C supports arguments -Wno-unused-function: YES #18 414.9 Compiler for C supports arguments -Wno-conversion: YES #18 414.9 Compiler for C supports arguments -Wno-misleading-indentation: YES #18 414.9 Library m found: YES #18 414.9 #18 414.9 ../meson.build:78:0: ERROR: Unknown compiler(s): [['gfortran'], ['flang'], ['nvfortran'], ['pgfortran'], ['ifort'], ['ifx'], ['g95']] #18 414.9 The following exception(s) were encountered: #18 414.9 Running `gfortran --version` gave "[Errno 2] No such file or directory: 'gfortran'" #18 414.9 Running `gfortran -V` gave "[Errno 2] No such file or directory: 'gfortran'" #18 414.9 Running `flang --version` gave "[Errno 2] No such file or directory: 'flang'" #18 414.9 Running `flang -V` gave "[Errno 2] No such file or directory: 'flang'" #18 414.9 Running `nvfortran --version` gave "[Errno 2] No such file or directory: 'nvfortran'" #18 414.9 Running `nvfortran -V` gave "[Errno 2] No such file or directory: 'nvfortran'" #18 414.9 Running `pgfortran --version` gave "[Errno 2] No such file or directory: 'pgfortran'" #18 414.9 Running `pgfortran -V` gave "[Errno 2] No such file or directory: 'pgfortran'" #18 414.9 Running `ifort --version` gave "[Errno 2] No such file or directory: 'ifort'" #18 414.9 Running `ifort -V` gave "[Errno 2] No such file or directory: 'ifort'" #18 414.9 Running `ifx --version` gave "[Errno 2] No such file or directory: 'ifx'" #18 414.9 Running `ifx -V` gave "[Errno 2] No such file or directory: 'ifx'" #18 414.9 Running `g95 --version` gave "[Errno 2] No such file or directory: 'g95'" #18 414.9 Running `g95 -V` gave "[Errno 2] No such file or directory: 'g95'" #18 414.9 #18 414.9 A full log can be found at /tmp/pip-install-y77ar9d0/scipy_1e543e0816ed4b26984[4155](https://github.com/zhengruifeng/spark/actions/runs/11357130578/job/31589506939#step:7:4161)33ae9079d/.mesonpy-xqfvs4ek/meson-logs/meson-log.txt #18 414.9 [end of output] ``` see https://github.com/zhengruifeng/spark/actions/runs/11357130578/job/31589506939 ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48489 from zhengruifeng/infra_scipy. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

Fix installing twine and add a step to syncing KEYS

9fdde84

github-actions bot added INFRA BUILD labels Mar 3, 2021

cloud-fan merged this pull request into cloud-fan:script Mar 3, 2021

HyukjinKwon deleted the script-pr branch January 4, 2022 00:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix installing twine and add a step to syncing KEYS #18

Fix installing twine and add a step to syncing KEYS #18

Uh oh!

HyukjinKwon commented Mar 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix installing twine and add a step to syncing KEYS #18

Fix installing twine and add a step to syncing KEYS #18

Uh oh!

Conversation

HyukjinKwon commented Mar 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants