Skip to content

Conversation

@yinxusen
Copy link
Contributor

What changes were proposed in this pull request?

The patch adds python API for mllib.stat.test.StreamingTest under JIRA https://issues.apache.org/jira/browse/SPARK-12042.

Note that for StreamingTestResult, unlike other test results in Python, I define it as a normal Python class which doesn't extend from TestResult with a _java_obj in it.

How was this patch tested?

The patch is tested with Python unit test.

@mengxr
Copy link
Contributor

mengxr commented Feb 25, 2016

cc: @feynmanliang

@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51989 has finished for PR 11374 at commit 770703b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yinxusen
Copy link
Contributor Author

yinxusen commented Mar 5, 2016

test it please

@SparkQA
Copy link

SparkQA commented Mar 6, 2016

Test build #52523 has finished for PR 11374 at commit e4e8d5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yinxusen
Copy link
Contributor Author

yinxusen commented Mar 6, 2016

cc @feynmanliang

@SparkQA
Copy link

SparkQA commented May 14, 2016

Test build #58598 has finished for PR 11374 at commit e4e8d5e.

  • This patch fails R style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Oct 24, 2016

Any updates to this PR?

@SparkQA
Copy link

SparkQA commented Oct 28, 2016

Test build #67696 has finished for PR 11374 at commit 615fbbb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yinxusen
Copy link
Contributor Author

Ping @mengxr @feynmanliang @yanboliang

@feynmanliang
Copy link
Contributor

I'll review this tonight

@feynmanliang
Copy link
Contributor

Apologies for the delay, I am traveling but I'll get this done this weekend.

Copy link
Contributor

@feynmanliang feynmanliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass.

It's been awhile since I've looked at PySpark so I may be a bit rusty on some things.


"""
Create a DStream that contains several RDDs to show the StreamingTest of PySpark.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like other examples are including a from __future__ import print_function here

sc = SparkContext(appName="PythonStreamingTestExample")
ssc = StreamingContext(sc, 1)

checkpoint_path = tempfile.mkdtemp()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

from pyspark.mllib.stat.test import BinarySample, StreamingTest

if __name__ == "__main__":

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't include newline here


from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.mllib.stat.test import BinarySample, StreamingTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$example on$ and $example off appear to be used in other examples, though I'm not sure why myself

ssc.checkpoint(checkpoint_path)

# Create the queue through which RDDs can be pushed to a QueueInputDStream.
rdd_queue = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use camelCase to be consistent with other examples

"""

checkpoint_path = tempfile.mkdtemp()
self.ssc.checkpoint(checkpoint_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

input_stream = self.ssc.queueStream(rdd_queue)

model = StreamingTest()
model.setPeacePeriod(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we break this into another test just for model params like

def test_model_params(self):
?

}
}

private[python] class StreamingTestResultPickler extends BasePickler[StreamingTestResult] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test these in PythonMLLibAPISuite?

streamingTest.setTestMethod(self._testMethod)

javaDStream = sc._jvm.SerDe.pythonToJava(data._jdstream, True)
testResult = streamingTest.registerStream(javaDStream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need pythonToJava and javaToPython; its not used for streaming K means

updatedModel = callMLlibFunc(

*/
@Since("1.6.0")
private[stat] class StreamingTestResult @Since("1.6.0") (
class StreamingTestResult @Since("1.6.0") (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public? Java API doesn't seem to require it

@HyukjinKwon
Copy link
Member

Hi @yinxusen, are you able to proceed this further? If not, it seems it might be better closed for now.

@asfgit asfgit closed this in ed338f7 Feb 17, 2017
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
## What changes were proposed in this pull request?

This PR proposes to close stale PRs.

What I mean by "stale" here includes that there are some review comments by reviewers but the author looks inactive without any answer to them more than a month.

I left some comments roughly a week ago to ping and the author looks still inactive in these PR below

These below includes some PR suggested to be closed and a PR against another branch which seems obviously inappropriate.

Given the comments in the last three PRs below, they are probably worth being taken over by anyone who is interested in it.

Closes apache#7963
Closes apache#8374
Closes apache#11192
Closes apache#11374
Closes apache#11692
Closes apache#12243
Closes apache#12583
Closes apache#12620
Closes apache#12675
Closes apache#12697
Closes apache#12800
Closes apache#13715
Closes apache#14266
Closes apache#15053
Closes apache#15159
Closes apache#15209
Closes apache#15264
Closes apache#15267
Closes apache#15871
Closes apache#15861
Closes apache#16319
Closes apache#16324
Closes apache#16890

Closes apache#12398
Closes apache#12933
Closes apache#14517

## How was this patch tested?

N/A

Author: hyukjinkwon <[email protected]>

Closes apache#16937 from HyukjinKwon/stale-prs-close.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants