Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 26, 2019

What changes were proposed in this pull request?

New test compares outputs of expression examples in comments with results of hiveResultString(). Also I fixed existing examples where actual and expected outputs are different.

Why are the changes needed?

This prevents mistakes in expression examples, and fixes existing mistakes in comments.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new test to SQLQuerySuite.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending tests

}
}

test("check outputs of expression examples") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixes are good. How long does this take to run, BTW? just want to make sure it's not huge to rerun this every time, though I agree testing examples is useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~ 15 seconds on my laptop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do it in parallel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the test in parallel takes ~5-6 seconds on my laptop now.

@MaxGekk MaxGekk changed the title [WIP][SPARK-29242][SQL][TEST] Check results of expression examples [SPARK-29242][SQL][TEST] Check results of expression examples Sep 26, 2019
@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111420 has finished for PR 25942 at commit 4740c8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Examples:
> SELECT _FUNC_(0, 'yyyy-MM-dd HH:mm:ss');
1970-01-01 00:00:00
1969-12-31 16:00:00
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, surprising.

Copy link
Member Author

@MaxGekk MaxGekk Sep 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The yyyy-MM-dd HH:mm:ss pattern does not contain the time zone sub-pattern. If you point out it, you will see something like:

spark-sql> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ssXXX');
1970-01-01 03:00:00+03:00

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you can change your current time zone to UTC to see 1970-01-01 00:00:00:

spark-sql> set spark.sql.session.timeZone=UTC;
spark.sql.session.timeZone	UTC
spark-sql> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ssXXX');
1970-01-01 00:00:00Z

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya. The timezone issue will make a failure on different timezone machines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the time zone is forcibly set to "America/Los_Angeles" in tests:

TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))

.createWithDefaultFunction(() => TimeZone.getDefault.getID)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super useful, @MaxGekk ! Thank you always.

BTW, could you add more check? For example, the current validation doesn't work if the example has multiple statements without semicolon like ParseUrl. This PR manually adds ';' to detect it, but it would be great if the test suite can detect the missing ';'.

Specifically, this PR doesn't detect the error if we missed ; in the example.

@dongjoon-hyun
Copy link
Member

Sorry for asking the additional stuff, @MaxGekk . 😉

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 26, 2019

@dongjoon-hyun Thank you for your review. The test detects such case indirectly, actually (it should fail on parsing actual SQL stmt + its output that is caught as well) ;-) I can check the number of > equals to the number of ;.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111428 has finished for PR 25942 at commit 3f1e42c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

"org.apache.spark.sql.catalyst.expressions.Uuid",
// The example call methods that return unstable results.
"org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection",
// Fails on parsing `SELECT 2 mod 1.8`:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I do for all the exceptions? @dongjoon-hyun @srowen Open a separate JIRA ticket per-each case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just fix what you have so far. I think it's fine to fix additional ones here, too. I don't think you need to fix each of the individually unless you feel they're logically distinct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed 3 out of 4. I will open a ticket for the last one.

Copy link
Member Author

@MaxGekk MaxGekk Sep 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the last one as well

@SparkQA
Copy link

SparkQA commented Sep 27, 2019

Test build #111449 has finished for PR 25942 at commit dcd9816.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 27, 2019

Test build #111470 has finished for PR 25942 at commit cee6709.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 27, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Sep 27, 2019

Test build #111473 has finished for PR 25942 at commit cee6709.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

This is actually a duplicate JIRA of SPARK-21914 I guess.

@HyukjinKwon HyukjinKwon changed the title [SPARK-29242][SQL][TEST] Check results of expression examples [SPARK-21914][SQL][TESTS] Check results of expression examples Sep 27, 2019
withClue(s"Function '${info.getName}', Expression class '$className'") {
val example = info.getExamples
checkExampleSyntax(example)
example.split(" > ").toList.foreach(_ match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (_ match { can be removed.

@HyukjinKwon
Copy link
Member

Merged to master.

@wangyum
Copy link
Member

wangyum commented Sep 28, 2019

Sorry @MaxGekk It seems this commit has flaky test on JDK 11:

Error Message
the pattern '\%SystemDrive\%\Users%' is invalid, the escape character is not allowed to precede 'U';
Stacktrace
      org.apache.spark.sql.AnalysisException: the pattern '\%SystemDrive\%\Users%' is invalid, the escape character is not allowed to precede 'U';
      at org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:48)
      at org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:57)
      at org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:108)
      at org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:51)
      at org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:54)
      at org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:57)
      at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:551)

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2-jdk-11/478/testReport/junit/org.apache.spark.sql/SQLQuerySuite/check_outputs_of_expression_examples

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111486/console

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 28, 2019

I guess this is because we run checks for RLIKE and LIKE in parallel, but the example for RLIKE changes settings in the same SparkSession: https://github.com/apache/spark/pull/25942/files#diff-39298b470865a4cbc67398a4ea11e767R174
There are at least 3 ways to fix:

  • Disable parallel run
  • Don't modify default settings in RLIKE examples
  • Copy SparkSession per each example
    WDYT?

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 28, 2019

Here is the PR which clones SparkSession: #25956

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Sep 28, 2019

I've came across same observation and found different issue. Please take a look at example of LIKE:

examples = """
Examples:
> SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\Users%';
true
""",
note = """
Use RLIKE to match with standard regular expressions.
""",
since = "1.0.0")
case class Like(left: Expression, right: Expression) extends StringRegexExpression {

If spark.sql.parser.escapedStringLiterals=false, then it should fail as there's \U in pattern (spark.sql.parser.escapedStringLiterals=false by default) but it doesn't fail.

The escape character is '\'. If an escape character precedes a special symbol or another
escape character, the following character is matched literally. It is invalid to escape
any other character.

For the query

SET spark.sql.parser.escapedStringLiterals=false;
SELECT '%SystemDrive%\Users\John' like '\%SystemDrive\%\Users%';

SQL parser removes single \ (not sure that is intended) so the expressions of Like are constructed as following:

LIKE - left %SystemDrive%UsersJohn / right \%SystemDrive\%Users%

which are no longer having origin intention.

Below query tests the origin intention:

SET spark.sql.parser.escapedStringLiterals=false;
SELECT '%SystemDrive%\\Users\\John' like '\%SystemDrive\%\\\\Users%';

LIKE - left %SystemDrive%\Users\John / right \%SystemDrive\%\\Users%

Note that \\\\ is needed in pattern as StringUtils.escapeLikeRegex requires \\ to represent normal character of \.

Same for RLIKE:

SET spark.sql.parser.escapedStringLiterals=true;
SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\\Users.*';

RLIKE - left %SystemDrive%\Users\John / right %SystemDrive%\\Users.*

which is OK, but

SET spark.sql.parser.escapedStringLiterals=false;
SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\Users.*';

RLIKE - left %SystemDrive%UsersJohn / right %SystemDrive%Users.*

which no longer have origin intention.

Below query tests the origin intention:

SET spark.sql.parser.escapedStringLiterals=true;
SELECT '%SystemDrive%\\Users\\John' rlike '%SystemDrive%\\\\Users.*';

RLIKE - left %SystemDrive%\Users\John / right %SystemDrive%\\Users.*

I'll raise a new patch to correct the examples.

@HeartSaVioR
Copy link
Contributor

#25957

HyukjinKwon pushed a commit that referenced this pull request Sep 28, 2019
…ples while checking the _FUNC_ pattern

### What changes were proposed in this pull request?

The `SET` commands do not contain the `_FUNC_` pattern a priori. In the PR, I propose filter out such commands in the `using _FUNC_ instead of function names in examples` test.

### Why are the changes needed?
After the merge of #25942, examples will require particular settings. Currently, the whole expression example has to be ignored which is so much. It makes sense to ignore only `SET` commands in expression examples.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?

By running the `using _FUNC_ instead of function names in examples` test.

Closes #25958 from MaxGekk/dont-check-_FUNC_-in-set.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
@MaxGekk MaxGekk deleted the run-expr-examples branch October 5, 2019 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants