Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jul 24, 2018

What changes were proposed in this pull request?

Uuid's results depend on random seed given during analysis. Thus under streaming query, we will have the same uuids in each execution. This seems to be incorrect for streaming query execution.

How was this patch tested?

Added test.


override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
case p => p transformExpressionsUp {
case Uuid(_) if p.isStreaming => Uuid(Some(random.nextLong()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal at all but can we remove (_) like _: Uuid ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure.

@viirya
Copy link
Member Author

viirya commented Jul 24, 2018

Actually I think Rand and Randn should also have the same issue. But I want to hear opinions first before dealing with them.

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93476 has finished for PR 21854 at commit 8ef299f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93479 has finished for PR 21854 at commit c1ce69c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93491 has finished for PR 21854 at commit c1ce69c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jul 24, 2018

cc @cloud-fan @hvanhovell

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93517 has finished for PR 21854 at commit 1d629dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

regardless of the implementation, is it expected to produce different UUID for different micro batches? Personally I think it's reasonable, micro batch and continuous execution should produce same result.

cc @tdas @zsxwing @jose-torres

@viirya
Copy link
Member Author

viirya commented Jul 31, 2018

ping @tdas @zsxwing @jose-torres

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Left some minor comments.


override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
case p => p transformExpressionsUp {
case _: Uuid if p.isStreaming => Uuid(Some(random.nextLong()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this into IncrementalExecution? You can put this close to the rule for CurrentBatchTimestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This rule is only needed for streaming queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. It should be more clear.

case p => p transformExpressionsUp {
// Produces a placeholder random seed for streaming query, the real random seed
// is given at the beginning of Optimizer.
case Uuid(None) if p.isStreaming => Uuid(Some(-1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not necessary. Right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uuid need to have a random seed initialized to make it resolved. This gives it a fake seed. Since we assign random seeds at optimizer, we can get rid of it. The intent here is to have a placeholder seed shown in analyzed plan. Not a big deal, so I'm going to remove it.

@viirya viirya force-pushed the uuid_in_streaming branch from 67a9387 to c127053 Compare July 31, 2018 23:40
@zsxwing
Copy link
Member

zsxwing commented Jul 31, 2018

LGTM pending tests

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93850 has finished for PR 21854 at commit c127053.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Aug 2, 2018

ping @cloud-fan @zsxwing Is this ready to merge? Thanks.

@zsxwing
Copy link
Member

zsxwing commented Aug 2, 2018

Thanks! Merging to master.

@asfgit asfgit closed this in d0bc3ed Aug 2, 2018
@viirya viirya deleted the uuid_in_streaming branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants