Skip to content

Conversation

@aviv-ebates
Copy link

Each of these 2 items has cost me a few hours of debugging. Hopefully, this will stop others from having to debug the same thing.

  1. Describe the expected type of fromOffsets param.
  2. Add missing method to StreamingListener, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say sth like a dictionary containing `TopicAndPartition` to integers..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that works too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we fix as so?

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

BTW, mind fixing the PR title to [MINOR][PYTHON] ... and make the title more descriptive? not a big deal but good to match it with other PRs.

@SparkQA
Copy link

SparkQA commented Apr 14, 2018

Test build #89368 has finished for PR 21057 at commit e040a68.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test to pyspark.streaming.tests.StreamingListenerTests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isnt doc only change then, I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first thought it's doc only change but I realised it's actually not after taking a close look.

Implementing onStreamingStarted looks actually required:

Add missing method to StreamingListener, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found.

This wasn't there at the first - https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/python/pyspark/streaming/listener.py / https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingListener.scala ; however, this method was added from ce99f51 but seems Python's implementation is missed.

Strictly it sounds better to have an explicit test since @aviv-ebates has a reproducer (assuming from the description) and should be easy to add.

@viirya
Copy link
Member

viirya commented Apr 15, 2018

I think you may create a minor JIRA ticket for this.

@HyukjinKwon
Copy link
Member

+1 for ^.

@aviv-ebates aviv-ebates changed the title 2 Improvements to Pyspark docs [MINOR][PYTHON] 2 Improvements to Pyspark docs Apr 16, 2018
Since this method is missing here, trying to use a Listener throws an expection. Since it's not documented, it's hard to handle.
@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89399 has finished for PR 21057 at commit 010de10.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@aviv-ebates
Copy link
Author

  1. Updated title
  2. Updated text to one suggested by @HyukjinKwon
  3. I don't have a reasonable way to make a test. Test scenario is "Configure pythong listener, run actual spark streaming". If you have a way to test that, that would be great; I don't know how to set up this test.
  4. Feel free to create a ticket and update commit messages when you merge this.
  5. I consider this a documentation change, because the method is already available and required; All I did is make sure it shows up in documentation (and incidentally, make it not-required). Feel free to change commit messages and pr text to reflect your understanding of this change.

@HyukjinKwon
Copy link
Member

It would be helpful if you open a JIRA and describe the issue. It could help other guys think a better way to test or would give clearer ideas to see if it's really difficult to add a test. Usually, JIRA is made first. See also https://spark.apache.org/contributing.html.

@aviv-ebates
Copy link
Author

If you're not clear about what I've done here, ask away. I don't wish to create a jira account in addition; Feel free to create whatever tickets you think are required.
I've already paid my price for this particular deficiency in Spark documentation, so I don't need this docfix for myself. I was hoping you'd like to improve the docs, so that the next guy would have an easier experience using Spark/Pyspark, but that's totally up to you.
I think I've done all that is reasonable to do for this documents improvements for your project. If you want this change, merge it in; If you don't, don't.

@HyukjinKwon
Copy link
Member

Right, let me try to cherry-pick and see if I can write a test. Will try to have some time and open a PR after cherry-picking your commit. I think you can close this PR.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Apr 18, 2018

Actually @viirya, would you be interested in this if you are available? I will do this by myself but I am currently not quite available. If you are busy too, let me try it later anyway.

@viirya
Copy link
Member

viirya commented Apr 18, 2018 via email

@HyukjinKwon
Copy link
Member

It's not urgent at all. I would appreciate it.

@HyukjinKwon
Copy link
Member

( I regret I happened to come over form Korea to Singapore too fast before your flight :-) )

@viirya
Copy link
Member

viirya commented Apr 18, 2018

:-)

asfgit pushed a commit that referenced this pull request Apr 19, 2018
…ener

## What changes were proposed in this pull request?

The `StreamingListener` in PySpark side seems to be lack of `onStreamingStarted` method. This patch adds it and a test for it.

This patch also includes a trivial doc improvement for `createDirectStream`.

Original PR is #21057.

## How was this patch tested?

Added test.

Author: Liang-Chi Hsieh <[email protected]>

Closes #21098 from viirya/SPARK-24014.

(cherry picked from commit 8bb0df2)
Signed-off-by: jerryshao <[email protected]>
ghost pushed a commit to dbtsai/spark that referenced this pull request Apr 19, 2018
…ener

## What changes were proposed in this pull request?

The `StreamingListener` in PySpark side seems to be lack of `onStreamingStarted` method. This patch adds it and a test for it.

This patch also includes a trivial doc improvement for `createDirectStream`.

Original PR is apache#21057.

## How was this patch tested?

Added test.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#21098 from viirya/SPARK-24014.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants