Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This patch refactors the python/run-tests script:

  • It's now written in Python instead of Bash.

  • The descriptions of the tests to run are now stored in dev/run-tests's modules. This allows the pull request builder to skip Python tests suites that were not affected by the pull request's changes. For example, we can now skip the PySpark Streaming test cases when only SQL files are changed.

  • python/run-tests now supports command-line flags to make it easier to run individual test suites (this addresses SPARK-5482):

    Usage: run-tests [options]
    
    Options:
    -h, --help            show this help message and exit
    --python-executables=PYTHON_EXECUTABLES
                          A comma-separated list of Python executables to test
                          against (default: python2.6,python3.4,pypy)
    --modules=MODULES     A comma-separated list of Python modules to test
                          (default: pyspark-core,pyspark-ml,pyspark-mllib
                          ,pyspark-sql,pyspark-streaming)
    
  • dev/run-tests has been split into multiple files: the module definitions and test utility functions are now stored inside of a dev/sparktestsupport Python module, allowing them to be re-used from the Python test runner script.

@JoshRosen JoshRosen changed the title [SPARK-8583] Refactor python/run-tests to integrate with dev/run-tests module system [SPARK-8583] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system Jun 24, 2015
@JoshRosen
Copy link
Contributor Author

/cc @andrewor14 @davies @brennonyork

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahirreddy, I think that you originally added these lines. Do we still need them? AFAIK the pull request builder's git clean -fdx should take care of this for us, but was there another reason why we need this?

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35624 has finished for PR 6967 at commit def2d8a.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen
Copy link
Contributor Author

Haha, this is now failing the tests for the test runner script:

**********************************************************************
File "./dev/run-tests.py", line 41, in __main__.determine_modules_for_files
Failed example:
    sorted(x.name for x in determine_modules_for_files(["python/pyspark/a.py", "sql/test/foo"]))
Expected:
    ['pyspark', 'sql']
Got:
    ['pyspark-core', 'sql']
**********************************************************************
File "./dev/run-tests.py", line 93, in __main__.determine_modules_to_test
Failed example:
    sorted(x.name for x in determine_modules_to_test([modules.sql]))
Expected:
    ['examples', 'hive-thriftserver', 'mllib', 'pyspark', 'sparkr', 'sql']
Got:
    ['examples', 'hive-thriftserver', 'mllib', 'pyspark-core', 'pyspark-mllib', 'pyspark-sql', 'pyspark-sql', 'pyspark-streaming', 'sparkr', 'sql']
**********************************************************************
2 items had failures:
   1 of   2 in __main__.determine_modules_for_files
   1 of   3 in __main__.determine_modules_to_test

I'll fix this shortly.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35641 has finished for PR 6967 at commit d6a77d3.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

dev/run-tests.py Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, just realized that pyspark-sql appears twice; I probably forgot a set() call somewhere; will investigate.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35685 has finished for PR 6967 at commit d6a77d3.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35691 has finished for PR 6967 at commit d6a77d3.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen
Copy link
Contributor Author

I also realized that this is going to need a bit of extra work for dev/run-tests to be able to trigger individual Python module tests. I might end up incorporating some of the ideas from #4269 into this by having dev/run-tests invoke python/run-tests with a list of module names to test.

@JoshRosen JoshRosen changed the title [SPARK-8583] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system Jun 24, 2015
@JoshRosen
Copy link
Contributor Author

I've updated this to incorporate passing flags to python/run-tests in order to select which suites should be run.

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35835 has finished for PR 6967 at commit f53db55.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35842 has finished for PR 6967 at commit 568a3fd.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35845 has finished for PR 6967 at commit c364ccf.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35868 has finished for PR 6967 at commit 27a389f.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35870 has finished for PR 6967 at commit 37aff00.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35878 has finished for PR 6967 at commit 8f65ed0.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen
Copy link
Contributor Author

This should now be ready for review. I have another open PR, #7031, which builds on this one to parallelize the Python tests. Once we review the non-parallel changes here and get this basic version committed, I'll rebase my other PR and we can look at the changes which were necessary to support parallelism.

dev/lint-python Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python/run-tests.py

@davies
Copy link
Contributor

davies commented Jun 27, 2015

LGTM, only two minor issues.

@SparkQA
Copy link

SparkQA commented Jun 27, 2015

Test build #35892 has finished for PR 6967 at commit 34c98d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen
Copy link
Contributor Author

@davies, thanks for the review. I've addressed your two comments, so let's wait and see if this passes tests. Once it passes, I'll merge and rebase my other PRs.

@SparkQA
Copy link

SparkQA commented Jun 28, 2015

Test build #35904 has finished for PR 6967 at commit f578d6d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor Author

Hopefully that test is just flaky, but let's see. If it fails again, I'll investigate.

@SparkQA
Copy link

SparkQA commented Jun 28, 2015

Test build #35909 has finished for PR 6967 at commit f578d6d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Module(object):

@asfgit asfgit closed this in 40648c5 Jun 28, 2015
@JoshRosen
Copy link
Contributor Author

Uh oh, looks like the dev/run-tests script has a problem:

Traceback (most recent call last):
  File "./dev/run-tests.py", line 477, in <module>
    main()
  File "./dev/run-tests.py", line 464, in main
    run_python_tests(modules_with_python_tests)
  File "./dev/run-tests.py", line 368, in run_python_tests
    command.append("--modules=%s" % ','.join(m.name for m in modules))

I'll investigate and hotfix.

@JoshRosen
Copy link
Contributor Author

Hotfixed in 42db3a1

asfgit pushed a commit that referenced this pull request Jun 28, 2015
@cocoatomo
Copy link
Contributor

Hi, @JoshRosen
When running run-tests.py with Python 2.6, I got a folloing error:

Running PySpark tests. Output is in python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
Will test against the following Python executables: ['python2.6', 'python3.4', 'pypy']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Traceback (most recent call last):
  File "./python/run-tests.py", line 196, in <module>
    main()
  File "./python/run-tests.py", line 159, in main
    python_implementation = subprocess.check_output(
AttributeError: 'module' object has no attribute 'check_output'

The cause of this error is using subprocess.check_output function, which exists since Python 2.7.
(ref. https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)

The paragraph https://spark.apache.org/docs/latest/#downloading says "Spark runs on Java 6+, Python 2.6+...", so should we make run-tests.py enable to run on Python 2.6?

@JoshRosen
Copy link
Contributor Author

Hi @cocoatomo,

If it's not hard to do, I think we should continue to support Python 2.6 for our infra/test scripts. If you submit a pull request I'll be happy to help review. Hopefully the fix is something simple, such as using the Popen constructor directly instead of using the convenience method.

@cocoatomo
Copy link
Contributor

I created the pull request for this: #7161.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants