-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24044][PYTHON] Explicitly print out skipped tests from unittest module #21107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/run-tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case anyone is worried:
Got an exception while trying to store skipped test output:
Traceback (most recent call last):
File "./python/run-tests.py", line 116, in run_individual_python_test
per_test_output.seek()
TypeError: seek() takes at least 1 argument (0 given)
|
Test build #89580 has finished for PR 21107 at commit
|
|
@viirya, @ueshin, @BryanCutler, @holdenk, @felixcheung, @cloud-fan, @JoshRosen, @yhuai and @bersprockets. It's actually about SPARK-23300. WDYT? There was an actual issue, SPARK-23300, and we fixed this by manually checking if the package is installed. This way needed duplicated codes and could only check dependencies. There are many conditions, for example, Python version specific or other packages like NumPy. I think this is something we should fix.
I will double check and make this clean up with a JIRA but just wonder if you guys like this way. See the Jenkins logs for the actual format. |
|
Test build #89581 has finished for PR 21107 at commit
|
|
Also, with a very small modification, you can get this branch to print messages for missing components as well: test_collect_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
test_datetime_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
test_limit_and_take (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
test_save_and_load_table (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
test_unbounded_frames (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
test_window_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
But maybe you are just trying to capture environmental problems here. By the way, this branch would not print any 'skipped' messages in environments where xmlrunner is installed. In environments without xmlrunner, I can see the messages. |
|
Sure. The things you said sound good to check. Will do. BTW, how about the format in the console and the way with regex? |
python/run-tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging other than Finished test if we skip it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
|
Will leave this open for few more days before starting to work further on this PR. |
|
The messages look good to me. Re: regex: If we figure out how to dynamically skip doctests, we may need a regex for those messages (since those tests don't have "test_" names). But we can cross that bridge when we get there. |
|
+@icexelloss too |
|
Thanks for doing this @HyukjinKwon , looks good so far! I was wondering if it is possible to skip an entire class from running, like |
python/run-tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we checking dependencies now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are now relaying on the existing checks in the tests. For example:
spark/python/pyspark/sql/tests.py
Lines 63 to 69 in ab7b961
| _pyarrow_requirement_message = None | |
| try: | |
| from pyspark.sql.utils import require_minimum_pyarrow_version | |
| require_minimum_pyarrow_version() | |
| except ImportError as e: | |
| # If Arrow version requirement is not satisfied, skip related tests. | |
| _pyarrow_requirement_message = _exception_message(e) |
spark/python/pyspark/sql/tests.py
Lines 3121 to 3123 in ab7b961
| @unittest.skipIf( | |
| not _have_pandas or not _have_pyarrow, | |
| _pandas_requirement_message or _pyarrow_requirement_message) |
which prints out a skip message like:
test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped
'PyArrow >= 0.8.0 must be installed; however, it was not found.'
which I am capturing here with a regex pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Thanks for the explanation!
|
@HyukjinKwon thanks for the work. This is much cleaner! |
|
@BryanCutler, will check and update after testing out. |
|
@BryanCutler, I checked this but looks still printing out duplicated logs .. however, I think in this way I could deal with #21107 (comment). |
d2dc4e3 to
56b9001
Compare
|
|
||
| def setUp(self): | ||
| if not self.hive_available: | ||
| self.skipTest("Hive is not available.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished test(python3): pyspark.sql.tests (51s) ... 93 tests were skipped
...
Skipped tests in pyspark.sql.tests with python3:
test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
...
test_collect_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive is not available.'
test_datetime_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive is not available.'
...
test_query_execution_listener_on_collect (pyspark.sql.tests.QueryExecutionListenerTests) ... skipped "'org.apache.spark.sql.TestQueryExecutionListener' is not available. Will skip the related tests."
...
@viirya, @bersprockets and @BryanCutler, these were the output from my partial testing in my local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little worry that it can be too verbose when skipped tests are too many. See #21107 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there is no way to only print out the skipped test class name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No clean way as far as I can tell. I should do another regex thing but .. I would like to avoid this way as possible as I can ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Yeah the current implementation looks good to me.
| os._exit(-1) | ||
| if skipped_counts != 0: | ||
| LOGGER.info( | ||
| "Finished test(%s): %s (%is) ... %s tests were skipped", pyspark_python, test_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there's a better format. let me know.
|
Will remove WIP after few more checking in my local and Jenkins's output. |
|
Test build #89687 has finished for PR 21107 at commit
|
|
Test build #89688 has finished for PR 21107 at commit
|
|
|
||
| for key, lines in sorted(SKIPPED_TESTS.items()): | ||
| pyspark_python, test_name = key | ||
| LOGGER.info("\nSkipped tests in %s with %s:" % (test_name, pyspark_python)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be too verbose to print all skipped tests? An option is to record them into LOG_FILE only. No strong preference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I get it but I feel sure people felt it should be printed out in the console from previous discussions. I actually don't feel strongly too.
Anyway we could have only few of them eventually because most of them are by missing Pandas and Arrow so probably this could be fine.
No strong preference here too. let me know if other guys have a preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think print is fine for now. we could also change later.
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this locally and LGTM. There is an increase in the verbosity, but not too much that it causes an issue and it's crucial that the skipped tests are printed out even if there are no errors.
It might be good to leave open for a bit to see if any more comments since it does change the verbosity level on all pyspark tests.
| cls.spark.stop() | ||
|
|
||
| def tearDown(self): | ||
| self.spark._jvm.OnSuccessCall.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not called if the test is skipped during setUp right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
|
Thanks, @BryanCutler. Will wait for more days in case. |
|
merged to master, thanks @HyukjinKwon ! |
|
Thank you for reviewing this @bersprockets, @viirya, @BryanCutler, @icexelloss and @felixcheung. |
What changes were proposed in this pull request?
This PR proposes to remove duplicated dependency checking logics and also print out skipped tests from unittests.
For example, as below:
Currently, it's not printed out in the console. I think we should better print out skipped tests in the console.
How was this patch tested?
Manually tested. Also, fortunately, Jenkins has good environment to test the skipped output.