Skip to content

Conversation

@WweiL
Copy link
Contributor

@WweiL WweiL commented Aug 16, 2023

What changes were proposed in this pull request?

Add several new test cases for streaming foreachBatch and streaming query listener events to test various scenarios.

Why are the changes needed?

More tests is better

Does this PR introduce any user-facing change?

No

How was this patch tested?

Test only change

@WweiL WweiL changed the title [SPARK-44435][SS][CONNECT][DRAFT] Tests for foreachBatch and Listener [SPARK-44435][SS][CONNECT] Tests for foreachBatch and Listener Aug 21, 2023
return {k[1:]: conv(v) for k, v in e.__dict__.items()}


def streaming_query_progress_as_dict(e: StreamingQueryProgress) -> Dict[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler way might be pyspark.cloupickle.dumps(event), save that as a table, and load it back, and unpickle it via pyspark.cloudpickle.loads(binary) and compare them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks! Never thought of that

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.5.

HyukjinKwon pushed a commit that referenced this pull request Aug 24, 2023
### What changes were proposed in this pull request?

Add several new test cases for streaming foreachBatch and streaming query listener events to test various scenarios.

### Why are the changes needed?

More tests is better

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes #42521 from WweiL/SPARK-44435-tests-foreachBatch-listener.

Authored-by: Wei Liu <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 2d44848)
Signed-off-by: Hyukjin Kwon <[email protected]>
@LuciferYang
Copy link
Contributor

https://github.com/apache/spark/actions/runs/5962873768/job/16174987432

Running tests...
----------------------------------------------------------------------
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
/__w/spark/spark/python/pyspark/sql/connect/session.py:185: UserWarning: [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamDuration".
See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'.
  warnings.warn(str(e))
/__w/spark/spark/python/pyspark/sql/connect/session.py:185: UserWarning: [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamSize".
See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'.
  warnings.warn(str(e))
/__w/spark/spark/python/pyspark/sql/connect/session.py:185: UserWarning: [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.grpc.binding.port".
See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'.
  warnings.warn(str(e))
  test_listener_events (pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests) ... Streaming query listener worker is starting with url sc://localhost:43833/;user_id= and sessionId a5a5becc-8da7-4d4b-9a7c-484cd957e3be.

[Stage 0:>                                                          (0 + 1) / 1]

[Stage 0:>                  (0 + 1) / 1][Stage 2:>                  (0 + 1) / 1]

                                                                                

[Stage 0:>                                                          (0 + 1) / 1]

                                                                                
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/streaming/worker/listener_worker.py", line 99, in <module>
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/streaming/worker/listener_worker.py", line 86, in main
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/streaming/worker/listener_worker.py", line 77, in process
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/streaming/listener.py", line 251, in fromJson
  File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/streaming/listener.py", line 480, in fromJson
KeyError: 'batchDuration'

[Stage 17:>                                                         (0 + 1) / 1]
ERROR (46.372s)

======================================================================
ERROR [46.372s]: test_listener_events (pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/sql/tests/connect/streaming/test_parity_listener.py", line 80, in test_listener_events
    self.spark.read.table("listener_progress_events").collect()[0][0]
  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1645, in collect
    table, schema = self._session.client.to_table(query)
  File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 833, in to_table
    table, schema, _, _, _ = self._execute_and_fetch(req)
  File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1257, in _execute_and_fetch
    for response in self._execute_and_fetch_as_iterator(req):
  File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1238, in _execute_and_fetch_as_iterator
    self._handle_error(error)
  File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1477, in _handle_error
    self._handle_rpc_error(error)
  File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1513, in _handle_rpc_error
    raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `listener_progress_events` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.;
'UnresolvedRelation [listener_progress_events], [], false


----------------------------------------------------------------------
Ran 1 test in 55.000s

FAILED (errors=1)

Generating XML reports...
Generated XML report: target/test-reports/TEST-pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests-20230824115154.xml

Had test failures in pyspark.sql.tests.connect.streaming.test_parity_listener with python3.9; see logs.
Error:  running /__w/spark/spark/python/run-tests --modules=pyspark-connect --parallelism=1 ; received return code 255
Error: Process completed with exit code 19.

@WweiL Are there any related PRs that have not been merged into branch-3.5? The branch-3.5 daily test failed today.

@WweiL
Copy link
Contributor Author

WweiL commented Aug 24, 2023

@LuciferYang Thanks for the ping! Let me checkout 3.5 and see

@dongjoon-hyun
Copy link
Member

Hi, all. I also saw the consecutive failures at three commits after this. Let me revert this from branch-3.5 first.

@dongjoon-hyun
Copy link
Member

This is reverted from branch-3.5 via 6c2da61 .
I'm going to monitor the CIs.

WweiL added a commit to WweiL/oss-spark that referenced this pull request Aug 24, 2023
### What changes were proposed in this pull request?

Add several new test cases for streaming foreachBatch and streaming query listener events to test various scenarios.

### Why are the changes needed?

More tests is better

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes apache#42521 from WweiL/SPARK-44435-tests-foreachBatch-listener.

Authored-by: Wei Liu <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 2d44848)
Signed-off-by: Hyukjin Kwon <[email protected]>
@WweiL
Copy link
Contributor Author

WweiL commented Aug 24, 2023

Created a separate PR to 3.5
#42664

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants