-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expression #20861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #88399 has finished for PR 20861 at commit
|
| this(catalog, conf, conf.optimizerMaxIterations) | ||
| } | ||
|
|
||
| private lazy val random = new Random() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put random in the ResolvedUuidExpressions? That makes it a little bit easier to follow.
| val uuids = new ArrayBuffer[Uuid]() | ||
| plan.transformUp { | ||
| case p => | ||
| p.transformExpressionsUp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: use collect?
|
|
||
| private def getUuidExpressions(plan: LogicalPlan): Seq[Uuid] = { | ||
| val uuids = new ArrayBuffer[Uuid]() | ||
| plan.transformUp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit use flatMap?
|
Test build #88414 has finished for PR 20861 at commit
|
|
@hvanhovell Thanks! Your comments are addressed. |
|
Test build #88422 has finished for PR 20861 at commit
|
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - merging to master. Thanks!
|
@viirya can you create a backport for 2.3? |
|
@hvanhovell Ok. But this needs #20817. Since #20817 just adds new class and doesn't change existing code, I think it can be directly merged into 2.3. Should I create a backport PR of it too? Or you can direct backport it? |
| override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp { | ||
| case p if p.resolved => p | ||
| case p => p transformExpressionsUp { | ||
| case Uuid(None) => Uuid(Some(random.nextLong())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we do the same thing for Rand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, if we want to make it deterministic between re-tries of same query. I think we should do it. I can make a PR for it, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. We can even create a base trait for these random functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked, and actually Rand and Randn already have this behavior that they are deterministic between re-tries, though their random seeds are not determined at analysis but at constructing.
So I'm thinking should we do the same thing (random seed initialized at analysis) to Rand and Randn? Besides just to be consistent with Uuid, is any good reason to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I think is, is any use case that we need to re-initialize random seed for Rand? Maybe streaming? For streaming query, I think Rand should use different random seed in each execution. For now, the random seed is initialized when constructing, even we re-analyze the query, it still uses same seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the current behavior for rand in streaming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a PR #21854 which shows behavior of Uuid in streaming. I think rand should be the same.
What changes were proposed in this pull request?
As stated in Jira, there are problems with current
Uuidexpression which usesjava.util.UUID.randomUUIDfor UUID generation.This patch uses the newly added
RandomUUIDGeneratorfor UUID generation. So we can makeUuiddeterministic between retries.How was this patch tested?
Added unit tests.