-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21914][SQL][TESTS] Check results of expression examples #25942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good pending tests
| } | ||
| } | ||
|
|
||
| test("check outputs of expression examples") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixes are good. How long does this take to run, BTW? just want to make sure it's not huge to rerun this every time, though I agree testing examples is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~ 15 seconds on my laptop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do it in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the test in parallel takes ~5-6 seconds on my laptop now.
|
Test build #111420 has finished for PR 25942 at commit
|
| Examples: | ||
| > SELECT _FUNC_(0, 'yyyy-MM-dd HH:mm:ss'); | ||
| 1970-01-01 00:00:00 | ||
| 1969-12-31 16:00:00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The yyyy-MM-dd HH:mm:ss pattern does not contain the time zone sub-pattern. If you point out it, you will see something like:
spark-sql> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ssXXX');
1970-01-01 03:00:00+03:00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you can change your current time zone to UTC to see 1970-01-01 00:00:00:
spark-sql> set spark.sql.session.timeZone=UTC;
spark.sql.session.timeZone UTC
spark-sql> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ssXXX');
1970-01-01 00:00:00ZThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya. The timezone issue will make a failure on different timezone machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the time zone is forcibly set to "America/Los_Angeles" in tests:
| TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles")) |
| .createWithDefaultFunction(() => TimeZone.getDefault.getID) |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super useful, @MaxGekk ! Thank you always.
BTW, could you add more check? For example, the current validation doesn't work if the example has multiple statements without semicolon like ParseUrl. This PR manually adds ';' to detect it, but it would be great if the test suite can detect the missing ';'.
Specifically, this PR doesn't detect the error if we missed ; in the example.
|
Sorry for asking the additional stuff, @MaxGekk . 😉 |
|
@dongjoon-hyun Thank you for your review. The test detects such case indirectly, actually (it should fail on parsing actual SQL stmt + its output that is caught as well) ;-) I can check the number of |
|
Test build #111428 has finished for PR 25942 at commit
|
| "org.apache.spark.sql.catalyst.expressions.Uuid", | ||
| // The example call methods that return unstable results. | ||
| "org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection", | ||
| // Fails on parsing `SELECT 2 mod 1.8`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should I do for all the exceptions? @dongjoon-hyun @srowen Open a separate JIRA ticket per-each case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to just fix what you have so far. I think it's fine to fix additional ones here, too. I don't think you need to fix each of the individually unless you feel they're logically distinct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed 3 out of 4. I will open a ticket for the last one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the last one as well
|
Test build #111449 has finished for PR 25942 at commit
|
|
Test build #111470 has finished for PR 25942 at commit
|
|
jenkins, retest this, please |
|
Test build #111473 has finished for PR 25942 at commit
|
|
This is actually a duplicate JIRA of SPARK-21914 I guess. |
| withClue(s"Function '${info.getName}', Expression class '$className'") { | ||
| val example = info.getExamples | ||
| checkExampleSyntax(example) | ||
| example.split(" > ").toList.foreach(_ match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (_ match { can be removed.
|
Merged to master. |
|
Sorry @MaxGekk It seems this commit has flaky test on JDK 11: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111486/console |
|
I guess this is because we run checks for
|
|
Here is the PR which clones SparkSession: #25956 |
|
I've came across same observation and found different issue. Please take a look at example of spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala Lines 97 to 106 in d72f398
If spark.sql.parser.escapedStringLiterals=false, then it should fail as there's For the query SQL parser removes single
which are no longer having origin intention. Below query tests the origin intention:
Note that Same for RLIKE:
which is OK, but
which no longer have origin intention. Below query tests the origin intention:
I'll raise a new patch to correct the examples. |
…ples while checking the _FUNC_ pattern ### What changes were proposed in this pull request? The `SET` commands do not contain the `_FUNC_` pattern a priori. In the PR, I propose filter out such commands in the `using _FUNC_ instead of function names in examples` test. ### Why are the changes needed? After the merge of #25942, examples will require particular settings. Currently, the whole expression example has to be ignored which is so much. It makes sense to ignore only `SET` commands in expression examples. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By running the `using _FUNC_ instead of function names in examples` test. Closes #25958 from MaxGekk/dont-check-_FUNC_-in-set. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
What changes were proposed in this pull request?
New test compares outputs of expression examples in comments with results of
hiveResultString(). Also I fixed existing examples where actual and expected outputs are different.Why are the changes needed?
This prevents mistakes in expression examples, and fixes existing mistakes in comments.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add new test to
SQLQuerySuite.