[SPARK-47933][PYTHON][TESTS][FOLLOW-UP] Enable doctest pyspark.sql.connect.column

zhengruifeng · zhengruifeng · commit ab00533221e2 · 2024-06-06T19:51:11.000+08:00
### What changes were proposed in this pull request? Enable doctest `pyspark.sql.connect.column` ### Why are the changes needed? test coverage ### Does this PR introduce _any_ user-facing change? no, test only ### How was this patch tested? manually check: I manually broke some doctests in `Column`, then found `pyspark.sql.connect.column` didn't fail: ``` (spark_dev_312) ➜ spark git:(master) ✗ python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.classic.column' Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the following Python tests: ['pyspark.sql.classic.column'] python3 python_implementation is CPython python3 version is: Python 3.12.2 Starting test(python3): pyspark.sql.classic.column (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/4bdd14b8-92ba-43ba-a7fb-655e6769aeb9/python3__pyspark.sql.classic.column__i2_c1zct.log) WARNING: Using incubator modules: jdk.incubator.vector Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). ********************************************************************** File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/column.py", line 385, in pyspark.sql.column.Column.contains Failed example: df.filter(df.name.contains('o')).collect() Differences (ndiff with -expected +actual): - [Row(age=5, name='Bobx')] ? - + [Row(age=5, name='Bob')] ********************************************************************** 1 of 2 in pyspark.sql.column.Column.contains ***Test Failed*** 1 failures. Had test failures in pyspark.sql.classic.column with python3; see logs. (spark_dev_312) ➜ spark git:(master) ✗ python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.connect.column' Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the following Python tests: ['pyspark.sql.connect.column'] python3 python_implementation is CPython python3 version is: Python 3.12.2 Starting test(python3): pyspark.sql.connect.column (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/2acaff3c-ef1d-41eb-b63e-509f3e0192c0/python3__pyspark.sql.connect.column__66td62h9.log) Finished test(python3): pyspark.sql.connect.column (3s) Tests passed in 3 seconds ``` after this PR, it fails as expected: ``` (spark_dev_312) ➜ spark git:(master) ✗ python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.connect.column' Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the following Python tests: ['pyspark.sql.connect.column'] python3 python_implementation is CPython python3 version is: Python 3.12.2 Starting test(python3): pyspark.sql.connect.column (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/390ff7ae-7683-425c-b0d2-ee336e1ad452/python3__pyspark.sql.connect.column__f69b3smc.log) WARNING: Using incubator modules: jdk.incubator.vector Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). org.apache.spark.SparkSQLException: [INVALID_CURSOR.DISCONNECTED] The cursor is invalid. The cursor has been disconnected by the server. SQLSTATE: HY109 at org.apache.spark.sql.connect.execution.ExecuteGrpcResponseSender.execute(ExecuteGrpcResponseSender.scala:281) at org.apache.spark.sql.connect.execution.ExecuteGrpcResponseSender$$anon$1.run(ExecuteGrpcResponseSender.scala:101) ********************************************************************** File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/column.py", line 385, in pyspark.sql.column.Column.contains Failed example: df.filter(df.name.contains('o')).collect() Expected: [Row(age=5, name='Bobx')] Got: [Row(age=5, name='Bob')] ********************************************************************** 1 of 2 in pyspark.sql.column.Column.contains ***Test Failed*** 1 failures. Had test failures in pyspark.sql.connect.column with python3; see logs. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #46895 from zhengruifeng/fix_connect_column_doc_test. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
diff --git a/python/pyspark/sql/connect/column.py b/python/pyspark/sql/connect/column.py
@@ -579,17 +579,17 @@ def _test() -> None:
     import sys
     import doctest
     from pyspark.sql import SparkSession as PySparkSession
-    import pyspark.sql.connect.column
+    import pyspark.sql.column
 
-    globs = pyspark.sql.connect.column.__dict__.copy()
+    globs = pyspark.sql.column.__dict__.copy()
     globs["spark"] = (
         PySparkSession.builder.appName("sql.connect.column tests")
         .remote(os.environ.get("SPARK_CONNECT_TESTING_REMOTE", "local[4]"))
         .getOrCreate()
     )
 
     (failure_count, test_count) = doctest.testmod(
-        pyspark.sql.connect.column,
+        pyspark.sql.column,
         globs=globs,
         optionflags=doctest.ELLIPSIS
         | doctest.NORMALIZE_WHITESPACE