[SPARK-24044][PYTHON] Explicitly print out skipped tests from unittest module #21107

HyukjinKwon · 2018-04-19T15:01:34Z

What changes were proposed in this pull request?

This PR proposes to remove duplicated dependency checking logics and also print out skipped tests from unittests.

For example, as below:

Skipped tests in pyspark.sql.tests with pypy:
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
...

Skipped tests in pyspark.sql.tests with python3:
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
...

Currently, it's not printed out in the console. I think we should better print out skipped tests in the console.

How was this patch tested?

Manually tested. Also, fortunately, Jenkins has good environment to test the skipped output.

HyukjinKwon · 2018-04-19T15:04:49Z

python/run-tests.py

Just in case anyone is worried:

Got an exception while trying to store skipped test output: Traceback (most recent call last): File "./python/run-tests.py", line 116, in run_individual_python_test per_test_output.seek() TypeError: seek() takes at least 1 argument (0 given)

SparkQA · 2018-04-19T15:41:44Z

Test build #89580 has finished for PR 21107 at commit ccba1c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-04-19T16:06:49Z

@viirya, @ueshin, @BryanCutler, @holdenk, @felixcheung, @cloud-fan, @JoshRosen, @yhuai and @bersprockets. It's actually about SPARK-23300. WDYT?

There was an actual issue, SPARK-23300, and we fixed this by manually checking if the package is installed. This way needed duplicated codes and could only check dependencies. There are many conditions, for example, Python version specific or other packages like NumPy. I think this is something we should fix.

unittest module can print out the skipped messages but these were swallowed so far in our own testing script. This PR prints out the messages below after sorted.

I will double check and make this clean up with a JIRA but just wonder if you guys like this way. See the Jenkins logs for the actual format.

SparkQA · 2018-04-19T16:43:35Z

Test build #89581 has finished for PR 21107 at commit d2dc4e3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

bersprockets · 2018-04-19T23:41:56Z

Also, with a very small modification, you can get this branch to print messages for missing components as well:

    test_collect_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_datetime_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_limit_and_take (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_save_and_load_table (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_unbounded_frames (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_window_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'

But maybe you are just trying to capture environmental problems here.

By the way, this branch would not print any 'skipped' messages in environments where xmlrunner is installed. In environments without xmlrunner, I can see the messages.

HyukjinKwon · 2018-04-20T00:40:21Z

Sure. The things you said sound good to check. Will do. BTW, how about the format in the console and the way with regex?

viirya · 2018-04-20T01:56:10Z

python/run-tests.py

Logging other than Finished test if we skip it?

Sounds good.

HyukjinKwon · 2018-04-20T02:01:34Z

Will leave this open for few more days before starting to work further on this PR.

bersprockets · 2018-04-20T02:43:07Z

The messages look good to me.

Re: regex: If we figure out how to dynamically skip doctests, we may need a regex for those messages (since those tests don't have "test_" names). But we can cross that bridge when we get there.

HyukjinKwon · 2018-04-20T02:56:57Z

+@icexelloss too

BryanCutler · 2018-04-20T16:29:58Z

Thanks for doing this @HyukjinKwon , looks good so far! I was wondering if it is possible to skip an entire class from running, like ArrowTests, instead of each individual test to avoid so many similar log messages. Do you think calling skipTest from setUp like here would work?

icexelloss · 2018-04-20T20:36:17Z

python/run-tests.py

How are we checking dependencies now?

We are now relaying on the existing checks in the tests. For example:

spark/python/pyspark/sql/tests.py

Lines 63 to 69 in ab7b961

_pyarrow_requirement_message = None

try:

from pyspark.sql.utils import require_minimum_pyarrow_version

require_minimum_pyarrow_version()

except ImportError as e:

# If Arrow version requirement is not satisfied, skip related tests.

_pyarrow_requirement_message = _exception_message(e)

spark/python/pyspark/sql/tests.py

Lines 3121 to 3123 in ab7b961

@unittest.skipIf(

not _have_pandas or not _have_pyarrow,

_pandas_requirement_message or _pyarrow_requirement_message)

which prints out a skip message like:

test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'

which I am capturing here with a regex pattern.

Gotcha. Thanks for the explanation!

icexelloss · 2018-04-20T20:37:03Z

@HyukjinKwon thanks for the work. This is much cleaner!

HyukjinKwon · 2018-04-22T05:57:12Z

@BryanCutler, will check and update after testing out.

HyukjinKwon · 2018-04-22T08:15:39Z

@BryanCutler, I checked this but looks still printing out duplicated logs .. however, I think in this way I could deal with #21107 (comment).

HyukjinKwon · 2018-04-22T09:03:00Z

python/pyspark/ml/tests.py

+
+    def setUp(self):
+        if not self.hive_available:
+            self.skipTest("Hive is not available.")


Finished test(python3): pyspark.sql.tests (51s) ... 93 tests were skipped ... Skipped tests in pyspark.sql.tests with python3: test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.' ... test_collect_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive is not available.' test_datetime_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive is not available.' ... test_query_execution_listener_on_collect (pyspark.sql.tests.QueryExecutionListenerTests) ... skipped "'org.apache.spark.sql.TestQueryExecutionListener' is not available. Will skip the related tests." ...

@viirya, @bersprockets and @BryanCutler, these were the output from my partial testing in my local.

A little worry that it can be too verbose when skipped tests are too many. See #21107 (comment).

I assume there is no way to only print out the skipped test class name?

No clean way as far as I can tell. I should do another regex thing but .. I would like to avoid this way as possible as I can ..

Gotcha. Yeah the current implementation looks good to me.

HyukjinKwon · 2018-04-22T09:08:01Z

python/run-tests.py

+            os._exit(-1)
+        if skipped_counts != 0:
+            LOGGER.info(
+                "Finished test(%s): %s (%is) ... %s tests were skipped", pyspark_python, test_name,


Not sure if there's a better format. let me know.

HyukjinKwon · 2018-04-22T09:12:31Z

Will remove WIP after few more checking in my local and Jenkins's output.

SparkQA · 2018-04-22T09:38:58Z

Test build #89687 has finished for PR 21107 at commit 56b9001.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-04-22T09:44:04Z

Test build #89688 has finished for PR 21107 at commit 3dd74a0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-04-22T14:03:47Z

python/run-tests.py


+    for key, lines in sorted(SKIPPED_TESTS.items()):
+        pyspark_python, test_name = key
+        LOGGER.info("\nSkipped tests in %s with %s:" % (test_name, pyspark_python))


Will it be too verbose to print all skipped tests? An option is to record them into LOG_FILE only. No strong preference here.

Yea, I get it but I feel sure people felt it should be printed out in the console from previous discussions. I actually don't feel strongly too.

Anyway we could have only few of them eventually because most of them are by missing Pandas and Arrow so probably this could be fine.

No strong preference here too. let me know if other guys have a preference.

i think print is fine for now. we could also change later.

BryanCutler

I tried this locally and LGTM. There is an increase in the verbosity, but not too much that it causes an issue and it's crucial that the skipped tests are printed out even if there are no errors.

It might be good to leave open for a bit to see if any more comments since it does change the verbosity level on all pyspark tests.

BryanCutler · 2018-04-24T17:51:01Z

python/pyspark/sql/tests.py

+            cls.spark.stop()

    def tearDown(self):
        self.spark._jvm.OnSuccessCall.clear()


This is not called if the test is skipped during setUp right?

HyukjinKwon · 2018-04-25T02:52:05Z

Thanks, @BryanCutler. Will wait for more days in case.

BryanCutler · 2018-04-26T22:13:04Z

merged to master, thanks @HyukjinKwon !

HyukjinKwon · 2018-04-27T00:43:47Z

Thank you for reviewing this @bersprockets, @viirya, @BryanCutler, @icexelloss and @felixcheung.

HyukjinKwon commented Apr 19, 2018

View reviewed changes

viirya reviewed Apr 20, 2018

View reviewed changes

icexelloss reviewed Apr 20, 2018

View reviewed changes

HyukjinKwon added 2 commits April 22, 2018 15:10

Explicitly print out skipped tests from unittest module

b075bb8

Make them sorted and remove duplicated messages in pyspark tests

8c1f16e

Address comments and see if it works

56b9001

HyukjinKwon force-pushed the skipped-tests-print branch from d2dc4e3 to 56b9001 Compare April 22, 2018 09:00

HyukjinKwon changed the title ~~[DO-NOT-MERGE][WIP] Explicitly print out skipped tests from unittest module~~ [SPARK-24044][WIP] Explicitly print out skipped tests from unittest module Apr 22, 2018

HyukjinKwon commented Apr 22, 2018

View reviewed changes

less diff?

3dd74a0

HyukjinKwon commented Apr 22, 2018

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-24044][WIP] Explicitly print out skipped tests from unittest module~~ [SPARK-24044][PYTHON] Explicitly print out skipped tests from unittest module Apr 22, 2018

viirya reviewed Apr 22, 2018

View reviewed changes

BryanCutler approved these changes Apr 24, 2018

View reviewed changes

bersprockets mentioned this pull request Apr 26, 2018

[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for -Phive #21141

Closed

asfgit closed this in f7435be Apr 26, 2018

HyukjinKwon deleted the skipped-tests-print branch October 16, 2018 12:44

	_pyarrow_requirement_message = None
	try:
	from pyspark.sql.utils import require_minimum_pyarrow_version
	require_minimum_pyarrow_version()
	except ImportError as e:
	# If Arrow version requirement is not satisfied, skip related tests.
	_pyarrow_requirement_message = _exception_message(e)

	@unittest.skipIf(
	not _have_pandas or not _have_pyarrow,
	_pandas_requirement_message or _pyarrow_requirement_message)

[SPARK-24044][PYTHON] Explicitly print out skipped tests from unittest module #21107

[SPARK-24044][PYTHON] Explicitly print out skipped tests from unittest module #21107

Uh oh!

Conversation

HyukjinKwon commented Apr 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 19, 2018

Uh oh!

HyukjinKwon commented Apr 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Apr 19, 2018

Uh oh!

bersprockets commented Apr 19, 2018

Uh oh!

HyukjinKwon commented Apr 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Apr 20, 2018

Uh oh!

bersprockets commented Apr 20, 2018

Uh oh!

HyukjinKwon commented Apr 20, 2018

Uh oh!

BryanCutler commented Apr 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

icexelloss commented Apr 20, 2018

Uh oh!

HyukjinKwon commented Apr 22, 2018

Uh oh!

HyukjinKwon commented Apr 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Apr 22, 2018

Uh oh!

SparkQA commented Apr 22, 2018

Uh oh!

SparkQA commented Apr 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Apr 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BryanCutler left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Apr 25, 2018

HyukjinKwon commented Apr 19, 2018 •

edited

Loading

HyukjinKwon commented Apr 19, 2018 •

edited

Loading

HyukjinKwon Apr 22, 2018 •

edited

Loading