[SPARK-21914][SQL][TESTS] Check results of expression examples #25942

MaxGekk · 2019-09-26T10:24:05Z

What changes were proposed in this pull request?

New test compares outputs of expression examples in comments with results of hiveResultString(). Also I fixed existing examples where actual and expected outputs are different.

Why are the changes needed?

This prevents mistakes in expression examples, and fixes existing mistakes in comments.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new test to SQLQuerySuite.

srowen

Looks good pending tests

srowen · 2019-09-26T13:21:39Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

    }
  }

+  test("check outputs of expression examples") {


The fixes are good. How long does this take to run, BTW? just want to make sure it's not huge to rerun this every time, though I agree testing examples is useful.

~ 15 seconds on my laptop

Could we do it in parallel?

I will do that.

Running the test in parallel takes ~5-6 seconds on my laptop now.

SparkQA · 2019-09-26T14:32:43Z

Test build #111420 has finished for PR 25942 at commit 4740c8d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-09-26T15:58:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

    Examples:
      > SELECT _FUNC_(0, 'yyyy-MM-dd HH:mm:ss');
-       1970-01-01 00:00:00
+       1969-12-31 16:00:00


Oh, surprising.

The yyyy-MM-dd HH:mm:ss pattern does not contain the time zone sub-pattern. If you point out it, you will see something like:

spark-sql> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ssXXX'); 1970-01-01 03:00:00+03:00

And you can change your current time zone to UTC to see 1970-01-01 00:00:00:

spark-sql> set spark.sql.session.timeZone=UTC; spark.sql.session.timeZone UTC spark-sql> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ssXXX'); 1970-01-01 00:00:00Z

Ya. The timezone issue will make a failure on different timezone machines.

but the time zone is forcibly set to "America/Los_Angeles" in tests:

spark/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala

Line 36 in 97dc4c0

TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Line 1384 in a1213d5

.createWithDefaultFunction(() => TimeZone.getDefault.getID)

dongjoon-hyun

This is super useful, @MaxGekk ! Thank you always.

BTW, could you add more check? For example, the current validation doesn't work if the example has multiple statements without semicolon like ParseUrl. This PR manually adds ';' to detect it, but it would be great if the test suite can detect the missing ';'.

Specifically, this PR doesn't detect the error if we missed ; in the example.

dongjoon-hyun · 2019-09-26T16:29:58Z

Sorry for asking the additional stuff, @MaxGekk . 😉

MaxGekk · 2019-09-26T16:38:20Z

@dongjoon-hyun Thank you for your review. The test detects such case indirectly, actually (it should fail on parsing actual SQL stmt + its output that is caught as well) ;-) I can check the number of > equals to the number of ;.

SparkQA · 2019-09-26T17:31:36Z

Test build #111428 has finished for PR 25942 at commit 3f1e42c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2019-09-26T17:53:57Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+      "org.apache.spark.sql.catalyst.expressions.Uuid",
+      // The example call methods that return unstable results.
+      "org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection",
+      // Fails on parsing `SELECT 2 mod 1.8`:


What should I do for all the exceptions? @dongjoon-hyun @srowen Open a separate JIRA ticket per-each case?

I think it's fine to just fix what you have so far. I think it's fine to fix additional ones here, too. I don't think you need to fix each of the individually unless you feel they're logically distinct.

I have fixed 3 out of 4. I will open a ticket for the last one.

Fix the last one as well

SparkQA · 2019-09-27T00:39:53Z

Test build #111449 has finished for PR 25942 at commit dcd9816.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-27T07:05:02Z

Test build #111470 has finished for PR 25942 at commit cee6709.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2019-09-27T07:08:19Z

jenkins, retest this, please

SparkQA · 2019-09-27T10:56:24Z

Test build #111473 has finished for PR 25942 at commit cee6709.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-27T11:38:56Z

This is actually a duplicate JIRA of SPARK-21914 I guess.

HyukjinKwon · 2019-09-27T12:29:50Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+          withClue(s"Function '${info.getName}', Expression class '$className'") {
+            val example = info.getExamples
+            checkExampleSyntax(example)
+            example.split("  > ").toList.foreach(_ match {


nit (_ match { can be removed.

HyukjinKwon · 2019-09-27T12:30:21Z

Merged to master.

wangyum · 2019-09-28T01:55:07Z

Sorry @MaxGekk It seems this commit has flaky test on JDK 11:

Error Message
the pattern '\%SystemDrive\%\Users%' is invalid, the escape character is not allowed to precede 'U';
Stacktrace
      org.apache.spark.sql.AnalysisException: the pattern '\%SystemDrive\%\Users%' is invalid, the escape character is not allowed to precede 'U';
      at org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:48)
      at org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:57)
      at org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:108)
      at org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:51)
      at org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:54)
      at org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:57)
      at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:551)

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2-jdk-11/478/testReport/junit/org.apache.spark.sql/SQLQuerySuite/check_outputs_of_expression_examples

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111486/console

MaxGekk · 2019-09-28T04:27:41Z

I guess this is because we run checks for RLIKE and LIKE in parallel, but the example for RLIKE changes settings in the same SparkSession: https://github.com/apache/spark/pull/25942/files#diff-39298b470865a4cbc67398a4ea11e767R174
There are at least 3 ways to fix:

Disable parallel run
Don't modify default settings in RLIKE examples
Copy SparkSession per each example
WDYT?

MaxGekk · 2019-09-28T05:17:49Z

Here is the PR which clones SparkSession: #25956

HeartSaVioR · 2019-09-28T06:35:29Z

I've came across same observation and found different issue. Please take a look at example of LIKE:

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala

Lines 97 to 106 in d72f398

    
             examples = """ 
        
               Examples: 
        
                 > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\Users%'; 
        
                 true 
        
             """, 
        
             note = """ 
        
               Use RLIKE to match with standard regular expressions. 
        
             """, 
        
             since = "1.0.0") 
        
           case class Like(left: Expression, right: Expression) extends StringRegexExpression {

If spark.sql.parser.escapedStringLiterals=false, then it should fail as there's \U in pattern (spark.sql.parser.escapedStringLiterals=false by default) but it doesn't fail.

The escape character is '\'. If an escape character precedes a special symbol or another
escape character, the following character is matched literally. It is invalid to escape
any other character.

For the query

SET spark.sql.parser.escapedStringLiterals=false;
SELECT '%SystemDrive%\Users\John' like '\%SystemDrive\%\Users%';

SQL parser removes single \ (not sure that is intended) so the expressions of Like are constructed as following:

LIKE - left %SystemDrive%UsersJohn / right \%SystemDrive\%Users%

which are no longer having origin intention.

Below query tests the origin intention:

SET spark.sql.parser.escapedStringLiterals=false;
SELECT '%SystemDrive%\\Users\\John' like '\%SystemDrive\%\\\\Users%';

LIKE - left %SystemDrive%\Users\John / right \%SystemDrive\%\\Users%

Note that \\\\ is needed in pattern as StringUtils.escapeLikeRegex requires \\ to represent normal character of \.

Same for RLIKE:

SET spark.sql.parser.escapedStringLiterals=true;
SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\\Users.*';

RLIKE - left %SystemDrive%\Users\John / right %SystemDrive%\\Users.*

which is OK, but

SET spark.sql.parser.escapedStringLiterals=false;
SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\Users.*';

RLIKE - left %SystemDrive%UsersJohn / right %SystemDrive%Users.*

which no longer have origin intention.

Below query tests the origin intention:

SET spark.sql.parser.escapedStringLiterals=true;
SELECT '%SystemDrive%\\Users\\John' rlike '%SystemDrive%\\\\Users.*';

RLIKE - left %SystemDrive%\Users\John / right %SystemDrive%\\Users.*

I'll raise a new patch to correct the examples.

HeartSaVioR · 2019-09-28T06:54:24Z

#25957

…ples while checking the _FUNC_ pattern ### What changes were proposed in this pull request? The `SET` commands do not contain the `_FUNC_` pattern a priori. In the PR, I propose filter out such commands in the `using _FUNC_ instead of function names in examples` test. ### Why are the changes needed? After the merge of #25942, examples will require particular settings. Currently, the whole expression example has to be ignored which is so much. It makes sense to ignore only `SET` commands in expression examples. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By running the `using _FUNC_ instead of function names in examples` test. Closes #25958 from MaxGekk/dont-check-_FUNC_-in-set. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

MaxGekk added 14 commits September 26, 2019 11:39

Add the test for checking output of expression examples

5a80e69

Fix expected results

1f52b75

Fix JsonTuple

d90a6ae

Fix ToUnixTimestamp

b0d2d4b

Fix Round

8abbd86

Fix Skewness

3dbb9cf

Fix StringToMap

9834f04

Fix ArrayForAll

c60bf69

Fix StringTrimLeft

201863c

Fix MakeTimestamp

6c158d8

Fix Kurtosis

3dd25a7

Fix BRound

1216b02

Disable Scala style checker for JsonTuple

4be0acd

Update the ignore list

4740c8d

srowen approved these changes Sep 26, 2019

View reviewed changes

MaxGekk added 3 commits September 26, 2019 16:33

Support multiline examples

3bc35f6

Fix examples

6e68f43

Put common code to unindentAndTrim()

3f1e42c

MaxGekk changed the title ~~[WIP][SPARK-29242][SQL][TEST] Check results of expression examples~~ [SPARK-29242][SQL][TEST] Check results of expression examples Sep 26, 2019

dongjoon-hyun reviewed Sep 26, 2019

View reviewed changes

dongjoon-hyun requested changes Sep 26, 2019

View reviewed changes

MaxGekk commented Sep 26, 2019

View reviewed changes

MaxGekk added 4 commits September 26, 2019 21:19

Check syntax of examples

c30195e

Fix MOD, make_timestamp and LIKE

33665b7

Fix RLIKE

2ca354e

Disable scalastyle checker for tabs in RLIKE

4a328bb

MaxGekk added 2 commits September 27, 2019 08:44

Run the test in parallel

41e4d7c

Remove logTrace

cee6709

HyukjinKwon approved these changes Sep 27, 2019

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-29242][SQL][TEST] Check results of expression examples~~ [SPARK-21914][SQL][TESTS] Check results of expression examples Sep 27, 2019

HyukjinKwon reviewed Sep 27, 2019

View reviewed changes

HyukjinKwon closed this in 4dd0066 Sep 27, 2019

MaxGekk mentioned this pull request Sep 28, 2019

[SPARK-29237][SQL][FOLLOWUP] Ignore SET commands in expression examples while checking the _FUNC_ pattern #25958

Closed

maropu mentioned this pull request Oct 1, 2019

[SPARK-29020][SQL] Improving array_sort behaviour #25728

Closed

MaxGekk mentioned this pull request Oct 5, 2019

[SPARK-29364][SQL] Return an interval from datediff() in the ANSI mode and Spark dialect #26034

Closed

MaxGekk deleted the run-expr-examples branch October 5, 2019 19:18

HyukjinKwon mentioned this pull request Apr 10, 2020

[SPARK-31369][SQL][DOCS] Documentation for JSON Functions #28170

Closed

[SPARK-21914][SQL][TESTS] Check results of expression examples #25942

[SPARK-21914][SQL][TESTS] Check results of expression examples #25942

Uh oh!

Conversation

MaxGekk commented Sep 26, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk Sep 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Sep 26, 2019

Uh oh!

MaxGekk commented Sep 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk Sep 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 27, 2019

Uh oh!

SparkQA commented Sep 27, 2019

Uh oh!

MaxGekk commented Sep 27, 2019

Uh oh!

SparkQA commented Sep 27, 2019

Uh oh!

HyukjinKwon commented Sep 27, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Sep 27, 2019

Uh oh!

wangyum commented Sep 28, 2019

Uh oh!

MaxGekk commented Sep 28, 2019

Uh oh!

MaxGekk commented Sep 28, 2019

Uh oh!

HeartSaVioR commented Sep 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGekk Sep 26, 2019 •

edited

Loading

MaxGekk commented Sep 26, 2019 •

edited

Loading

MaxGekk Sep 26, 2019 •

edited

Loading

HeartSaVioR commented Sep 28, 2019 •

edited

Loading