Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Nov 20, 2024

What changes were proposed in this pull request?

Skip Torch/DeepSpeed tests in MacOS PySpark Daily test

https://github.com/apache/spark/actions/runs/11921746968/job/33226552068

Why are the changes needed?

we don't need to test them on MacOS

Does this PR introduce any user-facing change?

no, test only

How was this patch tested?

They should be skipped due to no installation:

@unittest.skipIf(not have_torch, "torch is required")

manually test in my local MacOS:

(spark_312) ➜  spark git:(master) python/run-tests -k --python-executables python3 --testnames 'pyspark.ml.torch.tests.test_data_loader'
Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: ['pyspark.ml.torch.tests.test_data_loader']
python3 python_implementation is CPython
python3 version is: Python 3.12.7
Starting test(python3): pyspark.ml.torch.tests.test_data_loader (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/c7bf947a-a746-4519-b475-aed24a1f8cec/python3__pyspark.ml.torch.tests.test_data_loader__1yq21_l5.log)
Finished test(python3): pyspark.ml.torch.tests.test_data_loader (0s) ... 1 tests were skipped
Tests passed in 0 seconds

Skipped tests in pyspark.ml.torch.tests.test_data_loader with python3:
      test_data_loader (pyspark.ml.torch.tests.test_data_loader.TorchDistributorDataLoaderUnitTests.test_data_loader) ... skip (0.001s)

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhengruifeng
Copy link
Contributor Author

thanks, merged to master

@zhengruifeng zhengruifeng deleted the macos_py_lib branch November 20, 2024 04:40
zhengruifeng pushed a commit that referenced this pull request Nov 20, 2024
…val` not installed

### What changes were proposed in this pull request?
#48900 skipped relevant test cases on macOS by no longer installing dependencies related to `torch` and `deepseed`.

Subsequently, some test cases in `ml.tests.connect.test_legacy_mode_tuning.CrossValidatorTests` failed due to the absence of `torch/torcheval`:

```
ERROR (3.456s)
  test_crossvalidator_with_fold_col (pyspark.ml.tests.connect.test_legacy_mode_tuning.CrossValidatorTests.test_crossvalidator_with_fold_col) ... ERROR (2.550s)
  test_fit_maximize_metric (pyspark.ml.tests.connect.test_legacy_mode_tuning.CrossValidatorTests.test_fit_maximize_metric) ... ERROR (0.550s)
  test_fit_minimize_metric (pyspark.ml.tests.connect.test_legacy_mode_tuning.CrossValidatorTests.test_fit_minimize_metric) ... ERROR (0.544s)
  test_gen_avg_and_std_metrics (pyspark.ml.tests.connect.test_legacy_mode_tuning.CrossValidatorTests.test_gen_avg_and_std_metrics) ... ok (0.539s)

======================================================================
ERROR [9.991s]: test_copy (pyspark.ml.tests.connect.test_legacy_mode_tuning.CrossValidatorTests.test_copy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/spark/spark/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py", line 119, in test_copy
    cvModel = cv.fit(dataset)
              ^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/base.py", line 105, in fit
    return self._fit(dataset)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/tuning.py", line 435, in _fit
    for j, metric in pool.imap_unordered(lambda f: f(), tasks):
  File "/opt/homebrew/Cellar/python3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 873, in next
    raise value
  File "/opt/homebrew/Cellar/python3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/tuning.py", line 435, in <lambda>
    for j, metric in pool.imap_unordered(lambda f: f(), tasks):
                                                   ^^^
  File "/Users/runner/work/spark/spark/python/pyspark/util.py", line 423, in wrapped
    return f(*args, **kwargs)  # type: ignore[misc, operator]
           ^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/tuning.py", line 186, in single_task
    metric = evaluator.evaluate(
             ^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/base.py", line 254, in evaluate
    return self._evaluate(dataset)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/evaluation.py", line 57, in _evaluate
    torch_metric = self._get_torch_metric()
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/evaluation.py", line 125, in _get_torch_metric
    import torcheval.metrics as torchmetrics
ModuleNotFoundError: No module named 'torcheval'

======================================================================
ERROR [3.456s]: test_crossvalidator_on_pipeline (pyspark.ml.tests.connect.test_legacy_mode_tuning.CrossValidatorTests.test_crossvalidator_on_pipeline)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/spark/spark/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py", line 210, in test_crossvalidator_on_pipeline
    cv_model = cv.fit(train_dataset)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/base.py", line 105, in fit
    return self._fit(dataset)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/tuning.py", line 435, in _fit
    for j, metric in pool.imap_unordered(lambda f: f(), tasks):
  File "/opt/homebrew/Cellar/python3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 873, in next
    raise value
  File "/opt/homebrew/Cellar/python3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/tuning.py", line 435, in <lambda>
    for j, metric in pool.imap_unordered(lambda f: f(), tasks):
                                                   ^^^
  File "/Users/runner/work/spark/spark/python/pyspark/util.py", line 423, in wrapped
    return f(*args, **kwargs)  # type: ignore[misc, operator]
           ^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/tuning.py", line 185, in single_task
    model = estimator.fit(train, param_map)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/base.py", line 103, in fit
    return self.copy(params)._fit(dataset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/pipeline.py", line 201, in _fit
    model = stage.fit(dataset)  # type: ignore[attr-defined]
            ^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/base.py", line 105, in fit
    return self._fit(dataset)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/runner/work/spark/spark/python/pyspark/ml/connect/classification.py", line 219, in _fit
    import torch
ModuleNotFoundError: No module named 'torch'
....
```

Therefore, this pull request adds corresponding conditions to `CrossValidatorTests` to skip tests when `torch/torcheval` is not installed.

### Why are the changes neede

Skip `CrossValidatorTests` if `torch/torcheval` not installed

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48901 from LuciferYang/pyspark-on-macos-split-mltest.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
@dongjoon-hyun
Copy link
Member

Thank you, @zhengruifeng .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants