-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system #6967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahirreddy, I think that you originally added these lines. Do we still need them? AFAIK the pull request builder's git clean -fdx should take care of this for us, but was there another reason why we need this?
|
Test build #35624 has finished for PR 6967 at commit
|
|
Haha, this is now failing the tests for the test runner script: **********************************************************************
File "./dev/run-tests.py", line 41, in __main__.determine_modules_for_files
Failed example:
sorted(x.name for x in determine_modules_for_files(["python/pyspark/a.py", "sql/test/foo"]))
Expected:
['pyspark', 'sql']
Got:
['pyspark-core', 'sql']
**********************************************************************
File "./dev/run-tests.py", line 93, in __main__.determine_modules_to_test
Failed example:
sorted(x.name for x in determine_modules_to_test([modules.sql]))
Expected:
['examples', 'hive-thriftserver', 'mllib', 'pyspark', 'sparkr', 'sql']
Got:
['examples', 'hive-thriftserver', 'mllib', 'pyspark-core', 'pyspark-mllib', 'pyspark-sql', 'pyspark-sql', 'pyspark-streaming', 'sparkr', 'sql']
**********************************************************************
2 items had failures:
1 of 2 in __main__.determine_modules_for_files
1 of 3 in __main__.determine_modules_to_testI'll fix this shortly. |
|
Test build #35641 has finished for PR 6967 at commit
|
|
Jenkins, retest this please. |
dev/run-tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, just realized that pyspark-sql appears twice; I probably forgot a set() call somewhere; will investigate.
|
Test build #35685 has finished for PR 6967 at commit
|
|
Jenkins, retest this please. |
|
Test build #35691 has finished for PR 6967 at commit
|
|
I also realized that this is going to need a bit of extra work for dev/run-tests to be able to trigger individual Python module tests. I might end up incorporating some of the ideas from #4269 into this by having |
|
I've updated this to incorporate passing flags to |
…ling out to run tests
|
Test build #35835 has finished for PR 6967 at commit
|
|
Test build #35842 has finished for PR 6967 at commit
|
|
Test build #35845 has finished for PR 6967 at commit
|
|
Test build #35868 has finished for PR 6967 at commit
|
|
Test build #35870 has finished for PR 6967 at commit
|
|
Test build #35878 has finished for PR 6967 at commit
|
|
This should now be ready for review. I have another open PR, #7031, which builds on this one to parallelize the Python tests. Once we review the non-parallel changes here and get this basic version committed, I'll rebase my other PR and we can look at the changes which were necessary to support parallelism. |
dev/lint-python
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/run-tests.py
|
LGTM, only two minor issues. |
|
Test build #35892 has finished for PR 6967 at commit
|
|
@davies, thanks for the review. I've addressed your two comments, so let's wait and see if this passes tests. Once it passes, I'll merge and rebase my other PRs. |
|
Test build #35904 has finished for PR 6967 at commit
|
|
Jenkins, retest this please. |
|
Hopefully that test is just flaky, but let's see. If it fails again, I'll investigate. |
|
Test build #35909 has finished for PR 6967 at commit
|
|
Uh oh, looks like the dev/run-tests script has a problem: Traceback (most recent call last):
File "./dev/run-tests.py", line 477, in <module>
main()
File "./dev/run-tests.py", line 464, in main
run_python_tests(modules_with_python_tests)
File "./dev/run-tests.py", line 368, in run_python_tests
command.append("--modules=%s" % ','.join(m.name for m in modules))I'll investigate and hotfix. |
|
Hotfixed in 42db3a1 |
|
Hi, @JoshRosen The cause of this error is using subprocess.check_output function, which exists since Python 2.7. The paragraph https://spark.apache.org/docs/latest/#downloading says "Spark runs on Java 6+, Python 2.6+...", so should we make run-tests.py enable to run on Python 2.6? |
|
Hi @cocoatomo, If it's not hard to do, I think we should continue to support Python 2.6 for our infra/test scripts. If you submit a pull request I'll be happy to help review. Hopefully the fix is something simple, such as using the Popen constructor directly instead of using the convenience method. |
|
I created the pull request for this: #7161. Thanks. |
This patch refactors the
python/run-testsscript:It's now written in Python instead of Bash.
The descriptions of the tests to run are now stored in
dev/run-tests's modules. This allows the pull request builder to skip Python tests suites that were not affected by the pull request's changes. For example, we can now skip the PySpark Streaming test cases when only SQL files are changed.python/run-testsnow supports command-line flags to make it easier to run individual test suites (this addresses SPARK-5482):dev/run-testshas been split into multiple files: the module definitions and test utility functions are now stored inside of adev/sparktestsupportPython module, allowing them to be re-used from the Python test runner script.