Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Apr 19, 2018

What changes were proposed in this pull request?

This PR proposes to remove duplicated dependency checking logics and also print out skipped tests from unittests.

For example, as below:

Skipped tests in pyspark.sql.tests with pypy:
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
...

Skipped tests in pyspark.sql.tests with python3:
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
...

Currently, it's not printed out in the console. I think we should better print out skipped tests in the console.

How was this patch tested?

Manually tested. Also, fortunately, Jenkins has good environment to test the skipped output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case anyone is worried:

Got an exception while trying to store skipped test output:
Traceback (most recent call last):
  File "./python/run-tests.py", line 116, in run_individual_python_test
    per_test_output.seek()
TypeError: seek() takes at least 1 argument (0 given)

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89580 has finished for PR 21107 at commit ccba1c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Apr 19, 2018

@viirya, @ueshin, @BryanCutler, @holdenk, @felixcheung, @cloud-fan, @JoshRosen, @yhuai and @bersprockets. It's actually about SPARK-23300. WDYT?

There was an actual issue, SPARK-23300, and we fixed this by manually checking if the package is installed. This way needed duplicated codes and could only check dependencies. There are many conditions, for example, Python version specific or other packages like NumPy. I think this is something we should fix.

unittest module can print out the skipped messages but these were swallowed so far in our own testing script. This PR prints out the messages below after sorted.

I will double check and make this clean up with a JIRA but just wonder if you guys like this way. See the Jenkins logs for the actual format.

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89581 has finished for PR 21107 at commit d2dc4e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bersprockets
Copy link
Contributor

Also, with a very small modification, you can get this branch to print messages for missing components as well:

    test_collect_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_datetime_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_limit_and_take (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_save_and_load_table (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_unbounded_frames (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'
    test_window_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive not available'

But maybe you are just trying to capture environmental problems here.

By the way, this branch would not print any 'skipped' messages in environments where xmlrunner is installed. In environments without xmlrunner, I can see the messages.

@HyukjinKwon
Copy link
Member Author

Sure. The things you said sound good to check. Will do. BTW, how about the format in the console and the way with regex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging other than Finished test if we skip it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@HyukjinKwon
Copy link
Member Author

Will leave this open for few more days before starting to work further on this PR.

@bersprockets
Copy link
Contributor

The messages look good to me.

Re: regex: If we figure out how to dynamically skip doctests, we may need a regex for those messages (since those tests don't have "test_" names). But we can cross that bridge when we get there.

@HyukjinKwon
Copy link
Member Author

+@icexelloss too

@BryanCutler
Copy link
Member

Thanks for doing this @HyukjinKwon , looks good so far! I was wondering if it is possible to skip an entire class from running, like ArrowTests, instead of each individual test to avoid so many similar log messages. Do you think calling skipTest from setUp like here would work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we checking dependencies now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now relaying on the existing checks in the tests. For example:

_pyarrow_requirement_message = None
try:
from pyspark.sql.utils import require_minimum_pyarrow_version
require_minimum_pyarrow_version()
except ImportError as e:
# If Arrow version requirement is not satisfied, skip related tests.
_pyarrow_requirement_message = _exception_message(e)

@unittest.skipIf(
not _have_pandas or not _have_pyarrow,
_pandas_requirement_message or _pyarrow_requirement_message)

which prints out a skip message like:

    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 
'PyArrow >= 0.8.0 must be installed; however, it was not found.'

which I am capturing here with a regex pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks for the explanation!

@icexelloss
Copy link
Contributor

@HyukjinKwon thanks for the work. This is much cleaner!

@HyukjinKwon
Copy link
Member Author

@BryanCutler, will check and update after testing out.

@HyukjinKwon
Copy link
Member Author

@BryanCutler, I checked this but looks still printing out duplicated logs .. however, I think in this way I could deal with #21107 (comment).

@HyukjinKwon HyukjinKwon force-pushed the skipped-tests-print branch from d2dc4e3 to 56b9001 Compare April 22, 2018 09:00
@HyukjinKwon HyukjinKwon changed the title [DO-NOT-MERGE][WIP] Explicitly print out skipped tests from unittest module [SPARK-24044][WIP] Explicitly print out skipped tests from unittest module Apr 22, 2018

def setUp(self):
if not self.hive_available:
self.skipTest("Hive is not available.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished test(python3): pyspark.sql.tests (51s) ... 93 tests were skipped

...

Skipped tests in pyspark.sql.tests with python3:
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    ...
    test_collect_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive is not available.'
    test_datetime_functions (pyspark.sql.tests.HiveContextSQLTests) ... skipped 'Hive is not available.'
    ...
    test_query_execution_listener_on_collect (pyspark.sql.tests.QueryExecutionListenerTests) ... skipped "'org.apache.spark.sql.TestQueryExecutionListener' is not available. Will skip the related tests."
    ...

@viirya, @bersprockets and @BryanCutler, these were the output from my partial testing in my local.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little worry that it can be too verbose when skipped tests are too many. See #21107 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there is no way to only print out the skipped test class name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clean way as far as I can tell. I should do another regex thing but .. I would like to avoid this way as possible as I can ..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Yeah the current implementation looks good to me.

os._exit(-1)
if skipped_counts != 0:
LOGGER.info(
"Finished test(%s): %s (%is) ... %s tests were skipped", pyspark_python, test_name,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there's a better format. let me know.

@HyukjinKwon
Copy link
Member Author

Will remove WIP after few more checking in my local and Jenkins's output.

@SparkQA
Copy link

SparkQA commented Apr 22, 2018

Test build #89687 has finished for PR 21107 at commit 56b9001.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-24044][WIP] Explicitly print out skipped tests from unittest module [SPARK-24044][PYTHON] Explicitly print out skipped tests from unittest module Apr 22, 2018
@SparkQA
Copy link

SparkQA commented Apr 22, 2018

Test build #89688 has finished for PR 21107 at commit 3dd74a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


for key, lines in sorted(SKIPPED_TESTS.items()):
pyspark_python, test_name = key
LOGGER.info("\nSkipped tests in %s with %s:" % (test_name, pyspark_python))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be too verbose to print all skipped tests? An option is to record them into LOG_FILE only. No strong preference here.

Copy link
Member Author

@HyukjinKwon HyukjinKwon Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I get it but I feel sure people felt it should be printed out in the console from previous discussions. I actually don't feel strongly too.

Anyway we could have only few of them eventually because most of them are by missing Pandas and Arrow so probably this could be fine.

No strong preference here too. let me know if other guys have a preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think print is fine for now. we could also change later.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this locally and LGTM. There is an increase in the verbosity, but not too much that it causes an issue and it's crucial that the skipped tests are printed out even if there are no errors.

It might be good to leave open for a bit to see if any more comments since it does change the verbosity level on all pyspark tests.

cls.spark.stop()

def tearDown(self):
self.spark._jvm.OnSuccessCall.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not called if the test is skipped during setUp right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

@HyukjinKwon
Copy link
Member Author

Thanks, @BryanCutler. Will wait for more days in case.

@BryanCutler
Copy link
Member

merged to master, thanks @HyukjinKwon !

@asfgit asfgit closed this in f7435be Apr 26, 2018
@HyukjinKwon
Copy link
Member Author

Thank you for reviewing this @bersprockets, @viirya, @BryanCutler, @icexelloss and @felixcheung.

@HyukjinKwon HyukjinKwon deleted the skipped-tests-print branch October 16, 2018 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants