-
Notifications
You must be signed in to change notification settings - Fork 29k
[MINOR][PYTHON] 2 Improvements to Pyspark docs #21057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyspark/streaming/kafka.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say sth like a dictionary containing `TopicAndPartition` to integers..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that works too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we fix as so?
|
ok to test |
|
BTW, mind fixing the PR title to |
|
Test build #89368 has finished for PR 21057 at commit
|
python/pyspark/streaming/listener.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test to pyspark.streaming.tests.StreamingListenerTests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isnt doc only change then, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first thought it's doc only change but I realised it's actually not after taking a close look.
Implementing onStreamingStarted looks actually required:
Add missing method to StreamingListener, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found.
This wasn't there at the first - https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/python/pyspark/streaming/listener.py / https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingListener.scala ; however, this method was added from ce99f51 but seems Python's implementation is missed.
Strictly it sounds better to have an explicit test since @aviv-ebates has a reproducer (assuming from the description) and should be easy to add.
|
I think you may create a minor JIRA ticket for this. |
|
+1 for ^. |
Since this method is missing here, trying to use a Listener throws an expection. Since it's not documented, it's hard to handle.
|
Test build #89399 has finished for PR 21057 at commit
|
|
|
It would be helpful if you open a JIRA and describe the issue. It could help other guys think a better way to test or would give clearer ideas to see if it's really difficult to add a test. Usually, JIRA is made first. See also https://spark.apache.org/contributing.html. |
|
If you're not clear about what I've done here, ask away. I don't wish to create a jira account in addition; Feel free to create whatever tickets you think are required. |
|
Right, let me try to cherry-pick and see if I can write a test. Will try to have some time and open a PR after cherry-picking your commit. I think you can close this PR. |
|
Actually @viirya, would you be interested in this if you are available? I will do this by myself but I am currently not quite available. If you are busy too, let me try it later anyway. |
|
I will flight to Korea for a company workshop today. I can do this maybe
only in at tonight. If this isn't urgent, then it is okay.
…On Wed, Apr 18, 2018, 10:03 AM Hyukjin Kwon ***@***.***> wrote:
Actually @viirya <https://github.com/viirya>, would you be interested in
this if you are available? I will do this by myself but I am currently not
quite available. If you are busy too, let me try it anyway.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21057 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAEM914hLAgFHKhwMS9OU_PVN7G-WvQ9ks5tppDmgaJpZM4TSO_J>
.
|
|
It's not urgent at all. I would appreciate it. |
|
( I regret I happened to come over form Korea to Singapore too fast before your flight :-) ) |
|
:-) |
…ener ## What changes were proposed in this pull request? The `StreamingListener` in PySpark side seems to be lack of `onStreamingStarted` method. This patch adds it and a test for it. This patch also includes a trivial doc improvement for `createDirectStream`. Original PR is #21057. ## How was this patch tested? Added test. Author: Liang-Chi Hsieh <[email protected]> Closes #21098 from viirya/SPARK-24014. (cherry picked from commit 8bb0df2) Signed-off-by: jerryshao <[email protected]>
…ener ## What changes were proposed in this pull request? The `StreamingListener` in PySpark side seems to be lack of `onStreamingStarted` method. This patch adds it and a test for it. This patch also includes a trivial doc improvement for `createDirectStream`. Original PR is apache#21057. ## How was this patch tested? Added test. Author: Liang-Chi Hsieh <[email protected]> Closes apache#21098 from viirya/SPARK-24014.
Each of these 2 items has cost me a few hours of debugging. Hopefully, this will stop others from having to debug the same thing.
fromOffsetsparam.StreamingListener, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found.