[SPARK-44877][CONNECT][PYTHON] Support python protobuf functions for Spark Connect #42563

bogao007 · 2023-08-18T18:44:28Z

What changes were proposed in this pull request?

Support python protobuf functions for Spark Connect

Why are the changes needed?

Support python protobuf functions for Spark Connect

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

added doctest and did manual test

bo.gao@PF2WXGJ3KT spark % bin/pyspark --remote "local[*]" --jars connector/protobuf/target/scala-2.12/spark-protobuf_2.12-4.0.0-SNAPSHOT.jar
Python 3.9.6 (default, May  7 2023, 23:32:44)
[Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
23/08/18 10:47:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
/Users/bo.gao/workplace/spark/python/pyspark/pandas/__init__.py:50: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.
  warnings.warn(
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 4.0.0.dev0
      /_/

Using Python version 3.9.6 (default, May  7 2023 23:32:44)
Client connected to the Spark Connect server at localhost
SparkSession available as 'spark'.
>>> from pyspark.sql.protobuf.functions import from_protobuf, to_protobuf
>>> import tempfile
>>> data = [([(2, "Alice", 13093020)])]
>>> ddl_schema = "value struct<age: INTEGER, name: STRING, score: LONG>"
>>> df = spark.createDataFrame(data, ddl_schema)
>>> desc_hex = str('0ACE010A41636F6E6E6563746F722F70726F746F6275662F7372632F746573742F726'
...     '5736F75726365732F70726F746F6275662F7079737061726B5F746573742E70726F746F121D6F72672E61'
...     '70616368652E737061726B2E73716C2E70726F746F627566224B0A0D53696D706C654D657373616765121'
...     '00A03616765180120012805520361676512120A046E616D6518022001280952046E616D6512140A057363'
...     '6F7265180320012803520573636F72654215421353696D706C654D65737361676550726F746F736206707'
...     '26F746F33')
>>> with tempfile.TemporaryDirectory() as tmp_dir:
...      desc_file_path = "%s/pyspark_test.desc" % tmp_dir
...      with open(desc_file_path, "wb") as f:
...          _ = f.write(bytearray.fromhex(desc_hex))
...          f.flush()
...          message_name = 'SimpleMessage'
...          proto_df = df.select(
...              to_protobuf(df.value, message_name, desc_file_path).alias("value"))
...          proto_df.show(truncate=False)
...          proto_df_1 = proto_df.select( # With file name for descriptor
...              from_protobuf(proto_df.value, message_name, desc_file_path).alias("value"))
...          proto_df_1.show(truncate=False)
...          proto_df_2 = proto_df.select( # With binary for descriptor
...              from_protobuf(proto_df.value, message_name,
...                            binaryDescriptorSet = bytearray.fromhex(desc_hex))
...              .alias("value"))
...          proto_df_2.show(truncate=False)
...
+-------------------------------------------+
|value                                      |
+-------------------------------------------+
|[08 02 12 05 41 6C 69 63 65 18 9C 91 9F 06]|
+-------------------------------------------+

+--------------------+
|value               |
+--------------------+
|{2, Alice, 13093020}|
+--------------------+

+--------------------+
|value               |
+--------------------+
|{2, Alice, 13093020}|
+--------------------+

>>> data = [([(1668035962, 2020)])]
>>> ddl_schema = "value struct<seconds: LONG, nanos: INT>"
>>> df = spark.createDataFrame(data, ddl_schema)
>>> message_class_name = "org.sparkproject.spark_protobuf.protobuf.Timestamp"
>>> to_proto_df = df.select(to_protobuf(df.value, message_class_name).alias("value"))
>>> from_proto_df = to_proto_df.select(
...      from_protobuf(to_proto_df.value, message_class_name).alias("value"))
>>> from_proto_df.show(truncate=False)
+------------------+
|value             |
+------------------+
|{1668035962, 2020}|
+------------------+

>>> import tempfile
>>> data = [([(2, "Alice", 13093020)])]
>>> ddl_schema = "value struct<age: INTEGER, name: STRING, score: LONG>"
>>> df = spark.createDataFrame(data, ddl_schema)
>>> desc_hex = str('0ACE010A41636F6E6E6563746F722F70726F746F6275662F7372632F746573742F726'
...     '5736F75726365732F70726F746F6275662F7079737061726B5F746573742E70726F746F121D6F72672E61'
...     '70616368652E737061726B2E73716C2E70726F746F627566224B0A0D53696D706C654D657373616765121'
...     '00A03616765180120012805520361676512120A046E616D6518022001280952046E616D6512140A057363'
...     '6F7265180320012803520573636F72654215421353696D706C654D65737361676550726F746F736206707'
...     '26F746F33')
>>> with tempfile.TemporaryDirectory() as tmp_dir:
...      desc_file_path = "%s/pyspark_test.desc" % tmp_dir
...      with open(desc_file_path, "wb") as f:
...          _ = f.write(bytearray.fromhex(desc_hex))
...          f.flush()
...          message_name = 'SimpleMessage'
...          proto_df = df.select( # With file name for descriptor
...              to_protobuf(df.value, message_name, desc_file_path).alias("suite"))
...          proto_df.show(truncate=False)
...          proto_df_2 = df.select( # With binary for descriptor
...              to_protobuf(df.value, message_name,
...                          binaryDescriptorSet=bytearray.fromhex(desc_hex))
...              .alias("suite"))
...          proto_df_2.show(truncate=False)
...
+-------------------------------------------+
|suite                                      |
+-------------------------------------------+
|[08 02 12 05 41 6C 69 63 65 18 9C 91 9F 06]|
+-------------------------------------------+

+-------------------------------------------+
|suite                                      |
+-------------------------------------------+
|[08 02 12 05 41 6C 69 63 65 18 9C 91 9F 06]|
+-------------------------------------------+

>>> data = [([(1668035962, 2020)])]
>>> ddl_schema = "value struct<seconds: LONG, nanos: INT>"
>>> df = spark.createDataFrame(data, ddl_schema)
>>> message_class_name = "org.sparkproject.spark_protobuf.protobuf.Timestamp"
>>> proto_df = df.select(to_protobuf(df.value, message_class_name).alias("suite"))
>>> proto_df.show(truncate=False)
+----------------------------+
|suite                       |
+----------------------------+
|[08 FA EA B0 9B 06 10 E4 0F]|
+----------------------------+

bogao007 · 2023-08-18T18:45:22Z

assembly/pom.xml

          <version>${project.version}</version>
          <scope>provided</scope>
        </dependency>
+        <dependency>


This is needed to avoid java.lang.NoClassDefFoundError: org/apache/spark/sql/protobuf/CatalystDataToProtobuf. Similar issue was also mentioned in Avro support PR.

rangadi · 2023-08-18T19:17:11Z

dev/sparktestsupport/modules.py

    ],
 )

-connect = Module(


Did you swap the order of protobuf and connect? The diff looks confusing :).

yep, I swapped the order to let protobuf be able to use in dependencies=[hive, avro, protobuf]

rangadi · 2023-08-18T19:19:43Z

python/pyspark/sql/connect/protobuf/functions.py

+    protobuf_jar = search_jar("connector/protobuf", "spark-protobuf-assembly-", "spark-protobuf")
+    if protobuf_jar is None:
+        print(
+            "Skipping all Protobuf Python tests as the optional Protobuf project was "
+            "not compiled into a JAR. To run these tests, "
+            "you need to build Spark with 'build/sbt package' or "
+            "'build/mvn package' before running this test."
+        )
+        sys.exit(0)


This should not fail like this right?

This part is just copied from the original protobuf functions. Should we do it differently here?

rangadi · 2023-08-18T19:23:52Z

python/pyspark/sql/connect/protobuf/functions.py

+                "from_protobuf", _to_col(data), lit(messageName), lit(binary_proto)
+            )
+        else:
+            return _invoke_function(


So many calls to _invoke_function(). It will be much simpler if it allowed 'None' options and ignored it. @bogao007 could you include a comment about it?
cc: @LuciferYang, @HyukjinKwon

_invoke_function() doesn't seem to support None input. I will include a comment for it.

bogao007 · 2023-08-18T19:58:20Z

@HyukjinKwon could you help take a look at this PR? This needs to go to 3.5 branch as well, thanks!

rangadi

LGTM.

…Spark Connect ### What changes were proposed in this pull request? Support python protobuf functions for Spark Connect ### Why are the changes needed? Support python protobuf functions for Spark Connect ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? added doctest and did manual test ``` bo.gaoPF2WXGJ3KT spark % bin/pyspark --remote "local[*]" --jars connector/protobuf/target/scala-2.12/spark-protobuf_2.12-4.0.0-SNAPSHOT.jar Python 3.9.6 (default, May 7 2023, 23:32:44) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin Type "help", "copyright", "credits" or "license" for more information. 23/08/18 10:47:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). /Users/bo.gao/workplace/spark/python/pyspark/pandas/__init__.py:50: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched. warnings.warn( Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0.dev0 /_/ Using Python version 3.9.6 (default, May 7 2023 23:32:44) Client connected to the Spark Connect server at localhost SparkSession available as 'spark'. >>> from pyspark.sql.protobuf.functions import from_protobuf, to_protobuf >>> import tempfile >>> data = [([(2, "Alice", 13093020)])] >>> ddl_schema = "value struct<age: INTEGER, name: STRING, score: LONG>" >>> df = spark.createDataFrame(data, ddl_schema) >>> desc_hex = str('0ACE010A41636F6E6E6563746F722F70726F746F6275662F7372632F746573742F726' ... '5736F75726365732F70726F746F6275662F7079737061726B5F746573742E70726F746F121D6F72672E61' ... '70616368652E737061726B2E73716C2E70726F746F627566224B0A0D53696D706C654D657373616765121' ... '00A03616765180120012805520361676512120A046E616D6518022001280952046E616D6512140A057363' ... '6F7265180320012803520573636F72654215421353696D706C654D65737361676550726F746F736206707' ... '26F746F33') >>> with tempfile.TemporaryDirectory() as tmp_dir: ... desc_file_path = "%s/pyspark_test.desc" % tmp_dir ... with open(desc_file_path, "wb") as f: ... _ = f.write(bytearray.fromhex(desc_hex)) ... f.flush() ... message_name = 'SimpleMessage' ... proto_df = df.select( ... to_protobuf(df.value, message_name, desc_file_path).alias("value")) ... proto_df.show(truncate=False) ... proto_df_1 = proto_df.select( # With file name for descriptor ... from_protobuf(proto_df.value, message_name, desc_file_path).alias("value")) ... proto_df_1.show(truncate=False) ... proto_df_2 = proto_df.select( # With binary for descriptor ... from_protobuf(proto_df.value, message_name, ... binaryDescriptorSet = bytearray.fromhex(desc_hex)) ... .alias("value")) ... proto_df_2.show(truncate=False) ... +-------------------------------------------+ |value | +-------------------------------------------+ |[08 02 12 05 41 6C 69 63 65 18 9C 91 9F 06]| +-------------------------------------------+ +--------------------+ |value | +--------------------+ |{2, Alice, 13093020}| +--------------------+ +--------------------+ |value | +--------------------+ |{2, Alice, 13093020}| +--------------------+ ``` ``` >>> data = [([(1668035962, 2020)])] >>> ddl_schema = "value struct<seconds: LONG, nanos: INT>" >>> df = spark.createDataFrame(data, ddl_schema) >>> message_class_name = "org.sparkproject.spark_protobuf.protobuf.Timestamp" >>> to_proto_df = df.select(to_protobuf(df.value, message_class_name).alias("value")) >>> from_proto_df = to_proto_df.select( ... from_protobuf(to_proto_df.value, message_class_name).alias("value")) >>> from_proto_df.show(truncate=False) +------------------+ |value | +------------------+ |{1668035962, 2020}| +------------------+ ``` ``` >>> import tempfile >>> data = [([(2, "Alice", 13093020)])] >>> ddl_schema = "value struct<age: INTEGER, name: STRING, score: LONG>" >>> df = spark.createDataFrame(data, ddl_schema) >>> desc_hex = str('0ACE010A41636F6E6E6563746F722F70726F746F6275662F7372632F746573742F726' ... '5736F75726365732F70726F746F6275662F7079737061726B5F746573742E70726F746F121D6F72672E61' ... '70616368652E737061726B2E73716C2E70726F746F627566224B0A0D53696D706C654D657373616765121' ... '00A03616765180120012805520361676512120A046E616D6518022001280952046E616D6512140A057363' ... '6F7265180320012803520573636F72654215421353696D706C654D65737361676550726F746F736206707' ... '26F746F33') >>> with tempfile.TemporaryDirectory() as tmp_dir: ... desc_file_path = "%s/pyspark_test.desc" % tmp_dir ... with open(desc_file_path, "wb") as f: ... _ = f.write(bytearray.fromhex(desc_hex)) ... f.flush() ... message_name = 'SimpleMessage' ... proto_df = df.select( # With file name for descriptor ... to_protobuf(df.value, message_name, desc_file_path).alias("suite")) ... proto_df.show(truncate=False) ... proto_df_2 = df.select( # With binary for descriptor ... to_protobuf(df.value, message_name, ... binaryDescriptorSet=bytearray.fromhex(desc_hex)) ... .alias("suite")) ... proto_df_2.show(truncate=False) ... +-------------------------------------------+ |suite | +-------------------------------------------+ |[08 02 12 05 41 6C 69 63 65 18 9C 91 9F 06]| +-------------------------------------------+ +-------------------------------------------+ |suite | +-------------------------------------------+ |[08 02 12 05 41 6C 69 63 65 18 9C 91 9F 06]| +-------------------------------------------+ ``` ``` >>> data = [([(1668035962, 2020)])] >>> ddl_schema = "value struct<seconds: LONG, nanos: INT>" >>> df = spark.createDataFrame(data, ddl_schema) >>> message_class_name = "org.sparkproject.spark_protobuf.protobuf.Timestamp" >>> proto_df = df.select(to_protobuf(df.value, message_class_name).alias("suite")) >>> proto_df.show(truncate=False) +----------------------------+ |suite | +----------------------------+ |[08 FA EA B0 9B 06 10 E4 0F]| +----------------------------+ ``` Closes #42563 from bogao007/python-connect-protobuf. Authored-by: bogao007 <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]> (cherry picked from commit 5151b5b) Signed-off-by: Ruifeng Zheng <[email protected]>

zhengruifeng · 2023-08-21T01:13:02Z

merged to master and branch-3.5

…Spark Connect ### What changes were proposed in this pull request? Support python protobuf functions for Spark Connect ### Why are the changes needed? Support python protobuf functions for Spark Connect ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? added doctest and did manual test ``` bo.gaoPF2WXGJ3KT spark % bin/pyspark --remote "local[*]" --jars connector/protobuf/target/scala-2.12/spark-protobuf_2.12-4.0.0-SNAPSHOT.jar Python 3.9.6 (default, May 7 2023, 23:32:44) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin Type "help", "copyright", "credits" or "license" for more information. 23/08/18 10:47:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). /Users/bo.gao/workplace/spark/python/pyspark/pandas/__init__.py:50: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched. warnings.warn( Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0.dev0 /_/ Using Python version 3.9.6 (default, May 7 2023 23:32:44) Client connected to the Spark Connect server at localhost SparkSession available as 'spark'. >>> from pyspark.sql.protobuf.functions import from_protobuf, to_protobuf >>> import tempfile >>> data = [([(2, "Alice", 13093020)])] >>> ddl_schema = "value struct<age: INTEGER, name: STRING, score: LONG>" >>> df = spark.createDataFrame(data, ddl_schema) >>> desc_hex = str('0ACE010A41636F6E6E6563746F722F70726F746F6275662F7372632F746573742F726' ... '5736F75726365732F70726F746F6275662F7079737061726B5F746573742E70726F746F121D6F72672E61' ... '70616368652E737061726B2E73716C2E70726F746F627566224B0A0D53696D706C654D657373616765121' ... '00A03616765180120012805520361676512120A046E616D6518022001280952046E616D6512140A057363' ... '6F7265180320012803520573636F72654215421353696D706C654D65737361676550726F746F736206707' ... '26F746F33') >>> with tempfile.TemporaryDirectory() as tmp_dir: ... desc_file_path = "%s/pyspark_test.desc" % tmp_dir ... with open(desc_file_path, "wb") as f: ... _ = f.write(bytearray.fromhex(desc_hex)) ... f.flush() ... message_name = 'SimpleMessage' ... proto_df = df.select( ... to_protobuf(df.value, message_name, desc_file_path).alias("value")) ... proto_df.show(truncate=False) ... proto_df_1 = proto_df.select( # With file name for descriptor ... from_protobuf(proto_df.value, message_name, desc_file_path).alias("value")) ... proto_df_1.show(truncate=False) ... proto_df_2 = proto_df.select( # With binary for descriptor ... from_protobuf(proto_df.value, message_name, ... binaryDescriptorSet = bytearray.fromhex(desc_hex)) ... .alias("value")) ... proto_df_2.show(truncate=False) ... +-------------------------------------------+ |value | +-------------------------------------------+ |[08 02 12 05 41 6C 69 63 65 18 9C 91 9F 06]| +-------------------------------------------+ +--------------------+ |value | +--------------------+ |{2, Alice, 13093020}| +--------------------+ +--------------------+ |value | +--------------------+ |{2, Alice, 13093020}| +--------------------+ ``` ``` >>> data = [([(1668035962, 2020)])] >>> ddl_schema = "value struct<seconds: LONG, nanos: INT>" >>> df = spark.createDataFrame(data, ddl_schema) >>> message_class_name = "org.sparkproject.spark_protobuf.protobuf.Timestamp" >>> to_proto_df = df.select(to_protobuf(df.value, message_class_name).alias("value")) >>> from_proto_df = to_proto_df.select( ... from_protobuf(to_proto_df.value, message_class_name).alias("value")) >>> from_proto_df.show(truncate=False) +------------------+ |value | +------------------+ |{1668035962, 2020}| +------------------+ ``` ``` >>> import tempfile >>> data = [([(2, "Alice", 13093020)])] >>> ddl_schema = "value struct<age: INTEGER, name: STRING, score: LONG>" >>> df = spark.createDataFrame(data, ddl_schema) >>> desc_hex = str('0ACE010A41636F6E6E6563746F722F70726F746F6275662F7372632F746573742F726' ... '5736F75726365732F70726F746F6275662F7079737061726B5F746573742E70726F746F121D6F72672E61' ... '70616368652E737061726B2E73716C2E70726F746F627566224B0A0D53696D706C654D657373616765121' ... '00A03616765180120012805520361676512120A046E616D6518022001280952046E616D6512140A057363' ... '6F7265180320012803520573636F72654215421353696D706C654D65737361676550726F746F736206707' ... '26F746F33') >>> with tempfile.TemporaryDirectory() as tmp_dir: ... desc_file_path = "%s/pyspark_test.desc" % tmp_dir ... with open(desc_file_path, "wb") as f: ... _ = f.write(bytearray.fromhex(desc_hex)) ... f.flush() ... message_name = 'SimpleMessage' ... proto_df = df.select( # With file name for descriptor ... to_protobuf(df.value, message_name, desc_file_path).alias("suite")) ... proto_df.show(truncate=False) ... proto_df_2 = df.select( # With binary for descriptor ... to_protobuf(df.value, message_name, ... binaryDescriptorSet=bytearray.fromhex(desc_hex)) ... .alias("suite")) ... proto_df_2.show(truncate=False) ... +-------------------------------------------+ |suite | +-------------------------------------------+ |[08 02 12 05 41 6C 69 63 65 18 9C 91 9F 06]| +-------------------------------------------+ +-------------------------------------------+ |suite | +-------------------------------------------+ |[08 02 12 05 41 6C 69 63 65 18 9C 91 9F 06]| +-------------------------------------------+ ``` ``` >>> data = [([(1668035962, 2020)])] >>> ddl_schema = "value struct<seconds: LONG, nanos: INT>" >>> df = spark.createDataFrame(data, ddl_schema) >>> message_class_name = "org.sparkproject.spark_protobuf.protobuf.Timestamp" >>> proto_df = df.select(to_protobuf(df.value, message_class_name).alias("suite")) >>> proto_df.show(truncate=False) +----------------------------+ |suite | +----------------------------+ |[08 FA EA B0 9B 06 10 E4 0F]| +----------------------------+ ``` Closes apache#42563 from bogao007/python-connect-protobuf. Authored-by: bogao007 <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

…setup.py ### What changes were proposed in this pull request? This PR is a followup of #42563 (but using new JIRA as it's already released), which adds `pyspark.sql.connect.protobuf` into `setup.py`. ### Why are the changes needed? So PyPI packaged PySpark can support protobuf function with Spark Connect on. ### Does this PR introduce _any_ user-facing change? Yes. The new feature is now available with Spark Connect on if users install Spark Connect by `pip`. ### How was this patch tested? Being tested in #45870 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45924 from HyukjinKwon/SPARK-47762. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…setup.py This PR is a followup of #42563 (but using new JIRA as it's already released), which adds `pyspark.sql.connect.protobuf` into `setup.py`. So PyPI packaged PySpark can support protobuf function with Spark Connect on. Yes. The new feature is now available with Spark Connect on if users install Spark Connect by `pip`. Being tested in #45870 No. Closes #45924 from HyukjinKwon/SPARK-47762. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit f94d95d) Signed-off-by: Hyukjin Kwon <[email protected]>

Added python protobuf support for spark connect

703fb13

github-actions bot added SQL BUILD PYTHON CONNECT PROTOBUF labels Aug 18, 2023

bogao007 commented Aug 18, 2023

View reviewed changes

fixed format

334ee7c

rangadi reviewed Aug 18, 2023

View reviewed changes

Added TODO comment for simplifying the code

5f6b8d0

rangadi approved these changes Aug 18, 2023

View reviewed changes

bogao007 mentioned this pull request Aug 18, 2023

[SPARK-44435][SS][CONNECT] Tests for foreachBatch and Listener #42521

Closed

zhengruifeng approved these changes Aug 21, 2023

View reviewed changes

zhengruifeng changed the title ~~[SPARK-44877][CONNECT] Support python protobuf functions for Spark Connect~~ [SPARK-44877][CONNECT][PYTHON] Support python protobuf functions for Spark Connect Aug 21, 2023

zhengruifeng closed this in 5151b5b Aug 21, 2023

HyukjinKwon mentioned this pull request Apr 8, 2024

[SPARK-47762][PYTHON][CONNECT] Add pyspark.sql.connect.protobuf into setup.py #45924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-44877][CONNECT][PYTHON] Support python protobuf functions for Spark Connect #42563

[SPARK-44877][CONNECT][PYTHON] Support python protobuf functions for Spark Connect #42563

Uh oh!

bogao007 commented Aug 18, 2023

Uh oh!

bogao007 Aug 18, 2023

Uh oh!

rangadi Aug 18, 2023

Uh oh!

bogao007 Aug 18, 2023

Uh oh!

rangadi Aug 18, 2023

Uh oh!

bogao007 Aug 18, 2023

Uh oh!

rangadi Aug 18, 2023 •

edited

Loading

Uh oh!

bogao007 Aug 18, 2023

Uh oh!

bogao007 commented Aug 18, 2023 •

edited

Loading

Uh oh!

rangadi left a comment

Uh oh!

zhengruifeng commented Aug 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-44877][CONNECT][PYTHON] Support python protobuf functions for Spark Connect #42563

[SPARK-44877][CONNECT][PYTHON] Support python protobuf functions for Spark Connect #42563

Uh oh!

Conversation

bogao007 commented Aug 18, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

bogao007 Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

rangadi Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

bogao007 Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

rangadi Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

bogao007 Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

rangadi Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bogao007 Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

bogao007 commented Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rangadi left a comment

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Aug 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rangadi Aug 18, 2023 •

edited

Loading

bogao007 commented Aug 18, 2023 •

edited

Loading