forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 3
Python API for Spark Streaming #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
636 commits
Select commit
Hold shift + click to select a range
ac3440f
[SPARK-2859] Update url of Kryo project in related docs
gchen 74f82c7
SPARK-2380: Support displaying accumulator values in the web UI
pwendell 41e0a21
SPARK-1680: use configs for specifying environment variables on YARN
tgravescs cc491f6
[SPARK-2864][MLLIB] fix random seed in word2vec; move model to local
mengxr acff9a7
[SPARK-2503] Lower shuffle output buffer (spark.shuffle.file.buffer.k…
rxin 1aad911
[SPARK-2550][MLLIB][APACHE SPARK] Support regularization and intercep…
miccagiann 2643e66
SPARK-2869 - Fix tiny bug in JdbcRdd for closing jdbc connection
d94f599
[sql] rename project name in pom.xml of hive-thriftserver module
scwf d0ae3f3
[SPARK-2650][SQL] Try to partially fix SPARK-2650 by adjusting initia…
liancheng 69ec678
[SPARK-2854][SQL] Finalize _acceptable_types in pyspark.sql
yhuai 1d70c4f
[SPARK-2866][SQL] Support attributes in ORDER BY that aren't in SELECT
marmbrus 82624e2
[SPARK-2806] core - upgrade to json4s-jackson 3.2.10
avati b70bae4
[SQL] Tighten the visibility of various SQLConf methods and renamed s…
rxin 5a826c0
[SQL] Fix logging warn -> debug
marmbrus 63bdb1f
SPARK-2294: fix locality inversion bug in TaskManager
CodingCat c7b5201
[MLlib] Use this.type as return type in k-means' builder pattern
ee7f308
[SPARK-1022][Streaming][HOTFIX] Fixed zookeeper dependency of Kafka
tdas 09f7e45
[SPARK-2157] Enable tight firewall rules for Spark
andrewor14 4878911
[SPARK-2875] [PySpark] [SQL] handle null in schemaRDD()
davies a6cd311
[SPARK-2678][Core][SQL] A workaround for SPARK-2678
liancheng d614967
[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically
nchammas 4e98236
SPARK-2566. Update ShuffleWriteMetrics incrementally
sryza 25cff10
[SPARK-2852][MLLIB] API consistency for `mllib.feature`
mengxr e537b33
[PySpark] Add blanklines to Python docstrings so example code renders…
rnowling c6889d2
[HOTFIX][Streaming] Handle port collisions in flume polling test
andrewor14 4e00833
SPARK-2882: Spark build now checks local maven cache for dependencies
GregOwen 17caae4
[SPARK-2583] ConnectionManager error reporting
sarutak 4201d27
SPARK-2879 [BUILD] Use HTTPS to access Maven Central and other repos
srowen a263a7e
HOTFIX: Support custom Java 7 location
pwendell a120d07
WIP
giwa ffd1f59
[SPARK-2887] fix bug of countApproxDistinct() when have more than one…
davies 47ccd5e
[SPARK-2851] [mllib] DecisionTree Python consistency update
jkbradley 75993a6
SPARK-2879 part 2 [BUILD] Use HTTPS to access Maven Central and other…
srowen 8d1dec4
[mllib] DecisionTree Strategy parameter checks
jkbradley b9e9e53
[SPARK-2852][MLLIB] Separate model from IDF/StandardScaler algorithms
mengxr 80ec5ba
SPARK-2905 Fixed path sbin => bin
dosoft 32096c2
SPARK-2899 Doc generation is back to working in new SBT Build.
ScrapCodes 6906b69
SPARK-2787: Make sort-based shuffle write files directly when there's…
mateiz 4c51098
SPARK-2565. Update ShuffleReadMetrics as blocks are fetched
sryza 9de6a42
[SPARK-2904] Remove non-used local variable in SparkSubmitArguments
sarutak 9a54de1
[SPARK-2911]: provide rdd.parent[T](j) to obtain jth parent RDD
erikerlandson 9016af3
[SPARK-2888] [SQL] Fix addColumnMetadataToConf in HiveTableScan
yhuai 0489cee
[SPARK-2908] [SQL] JsonRDD.nullTypeToStringType does not convert all …
yhuai c874723
[SPARK-2877] [SQL] MetastoreRelation should use SparkClassLoader when…
yhuai 45d8f4d
[SPARK-2919] [SQL] Basic support for analyze command in HiveQl
yhuai b7c89a7
[SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) shou…
chutium 74d6f62
[SPARK-1997][MLLIB] update breeze to 0.9
mengxr ec79063
[SPARK-2897][SPARK-2920]TorrentBroadcast does use the serializer clas…
witgo 1c84dba
[Web UI]Make decision order of Worker's WebUI port consistent with Ma…
WangTaoTheTonic 43af281
[SPARK-2911] apply parent[T](j) to clarify UnionRDD code
erikerlandson 28dbae8
[SPARK-2635] Fix race condition at SchedulerBackend.isReady in standa…
li-zhihui b431e67
[SPARK-2861] Fix Doc comment of histogram method
e45daf2
[SPARK-1766] sorted functions to meet pedantic requirements
4f4a988
[SPARK-2894] spark-shell doesn't accept flags
sarutak 5b6585d
Updated Spark SQL README to include the hive-thriftserver module
rxin 482c5af
Turn UpdateBlockInfo into case class.
rxin 3570119
Remove extra semicolon in Task.scala
witgo 1d03a26
[SPARK-2950] Add gc time and shuffle write time to JobLogger
shivaram 28dcbb5
[SPARK-2898] [PySpark] fix bugs in deamon.py
davies b715aa0
[SPARK-2937] Separate out samplyByKeyExact as its own API in PairRDDF…
dorx 90ae568
WIP added test case
giwa ba28a8f
[SPARK-2936] Migrate Netty network module from Java to Scala
rxin 2cfd3a0
added basic operation test cases
giwa db0a303
delete waste file
giwa 3334169
fixed PEP-008 violation
giwa e8c7bfc
remove export PYSPARK_PYTHON in spark submit
giwa bdde697
removed unnesessary changes
giwa a65f302
edited the comment to add more precise description
giwa db06a81
[PySpark] [SPARK-2954] [SPARK-2948] [SPARK-2910] [SPARK-2101] Python …
JoshRosen 3733866
[SPARK-2952] Enable logging actor messages at DEBUG level
rxin 90a6484
added mapValues and flatMapVaules WIP for glom and mapPartitions test
giwa 7712e72
[SPARK-2931] In TaskSetManager, reset currentLocalityIndex after reco…
JoshRosen 32638b5
[SPARK-2515][mllib] Chi Squared test
dorx 6fab941
[SPARK-2934][MLlib] Adding LogisticRegressionWithLBFGS Interface
490ecfa
[SPARK-2844][SQL] Correctly set JVM HiveContext if it is passed into …
ahirreddy 21a95ef
[SPARK-2590][SQL] Added option to handle incremental collection, disa…
liancheng e83fdcd
[sql]use SparkSQLEnv.stop() in ShutdownHook
scwf 647aeba
[SQL] A tiny refactoring in HiveContext#analyze
yhuai c9c89c3
[SPARK-2965][SQL] Fix HashOuterJoin output nullabilities.
ueshin c686b7d
[SPARK-2968][SQL] Fix nullabilities of Explode.
ueshin bad21ed
[SPARK-2650][SQL] Build column buffers in smaller batches
marmbrus 5d54d71
[SQL] [SPARK-2826] Reduce the memory copy while building the hashmap …
chenghao-intel 9038d94
[SPARK-2923][MLLIB] Implement some basic BLAS routines
mengxr f0060b7
[MLlib] Correctly set vectorSize and alpha
Ishiihara 882da57
fix flaky tests
davies c235b83
SPARK-2830 [MLlib]: re-organize mllib documentation
atalwalkar 676f982
[SPARK-2953] Allow using short names for io compression codecs
rxin 246cb3f
Use transferTo when copy merge files in ExternalSorter
colorant 2bd8126
[SPARK-1777 (partial)] bugfix: make size of requested memory correctly
liyezhang556520 fe47359
[SPARK-2993] [MLLib] colStats (wrapper around MultivariateStatistical…
dorx 869f06c
[SPARK-2963] [SQL] There no documentation about building to use HiveS…
sarutak c974a71
[SPARK-3013] [SQL] [PySpark] convert array into list
davies 434bea1
[SPARK-2983] [PySpark] improve performance of sortByKey()
davies 7ecb867
[MLLIB] use Iterator.fill instead of Array.fill
mengxr bdc7a1a
[SPARK-3004][SQL] Added null checking when retrieving row set
liancheng 13f54e2
[SPARK-2817] [SQL] add "show create table" support
tianyi 9256d4a
[SPARK-2994][SQL] Support for udfs that take complex types
marmbrus 376a82e
[SPARK-2650][SQL] More precise initial buffer size estimation for in-…
liancheng 9fde1ff
[SPARK-2935][SQL]Fix parquet predicate push down bug
marmbrus 905dc4b
[SPARK-2970] [SQL] spark-sql script ends with IOException when EventL…
sarutak 63d6777
[SPARK-2986] [SQL] fixed: setting properties does not effect
0c7b452
SPARK-3020: Print completed indices rather than tasks in web UI
pwendell 0704b86
WIP: solved partitioned and None is not recognized
giwa 9497b12
[SPARK-3006] Failed to execute spark-shell in Windows OS
tsudukim e424565
[Docs] Add missing <code> tags (minor)
andrewor14 69a57a1
[SPARK-2995][MLLIB] add ALS.setIntermediateRDDStorageLevel
mengxr d069c5d
[SPARK-3029] Disable local execution of Spark jobs by default
aarondav 080541a
broke something
giwa 6b8de0e
SPARK-2893: Do not swallow Exceptions when running a custom kryo regi…
GrahamDennis 078f3fb
[SPARK-3011][SQL] _temporary directory should be filtered out by sqlC…
josephsu add75d4
[SPARK-2927][SQL] Add a conf to configure if we always read Binary co…
yhuai fde692b
[SQL] Python JsonRDD UTF8 Encoding Fix
ahirreddy 267fdff
[SPARK-2925] [sql]fix spark-sql and start-thriftserver shell bugs whe…
scwf eaeb0f7
Minor cleanup of metrics.Source
rxin 9622106
[SPARK-2979][MLlib] Improve the convergence rate by minimizing the co…
a7f8a4f
Revert [SPARK-3011][SQL] _temporary directory should be filtered out…
marmbrus a75bc7a
SPARK-3009: Reverted readObject method in ApplicationInfo so that App…
jacek-lewandowski fa5a08e
Make dev/mima runnable on Mac OS X.
rxin 2112638
all tests are passed if numSlice is 2 and the numver of each input is…
giwa 655699f
[SPARK-3027] TaskContext: tighten visibility and provide Java friendl…
rxin 3a8b68b
[SPARK-2468] Netty based block server / client module
rxin 9422a9b
[SPARK-2736] PySpark converter and example script for reading Avro files
kanzhang 500f84e
[SPARK-2912] [Spark QA] Include commit hash in Spark QA messages
nchammas e1b85f3
SPARK-2955 [BUILD] Test code fails to compile with "mvn compile" with…
srowen fba8ec3
Add caching information to rdd.toDebugString
536def4
basic function test cases are passed
giwa a14c7e1
modified streaming test case to add coment
giwa 7589c39
[SPARK-2924] remove default args to overloaded methods
avati fd9fcd2
Revert "[SPARK-2468] Netty based block server / client module"
pwendell e3033fc
remove waste duplicated code
giwa 0afe5cb
SPARK-3028. sparkEventToJson should support SparkListenerExecutorMetr…
sryza c703229
[SPARK-3022] [SPARK-3041] [mllib] Call findBins once per level + unor…
jkbradley cc36487
[SPARK-3046] use executor's class loader as the default serializer cl…
rxin 89ae38a
added saveAsTextFiles and saveAsPickledFiles
giwa 5d25c0b
[SPARK-3078][MLLIB] Make LRWithLBFGS API consistent with others
mengxr 2e069ca
[SPARK-3001][MLLIB] Improve Spearman's correlation
mengxr ea9c873
added TODO coments
giwa c9da466
[SPARK-3015] Block on cleaning tasks to prevent Akka timeouts
andrewor14 a83c772
[SPARK-3045] Make Serializer interface Java friendly
rxin 20fcf3d
[SPARK-2977] Ensure ShuffleManager is created before ShuffleBlockManager
JoshRosen b4a0592
[SQL] Using safe floating-point numbers in doctest
liancheng 4bdfaa1
[SPARK-3076] [Jenkins] catch & report test timeouts
nchammas 76fa0ea
[SPARK-2677] BasicBlockFetchIterator#next can wait forever
sarutak 7e70708
[SPARK-3048][MLLIB] add LabeledPoint.parse and remove loadStreamingLa…
mengxr ac6411c
[SPARK-3081][MLLIB] rename RandomRDDGenerators to RandomRDDs
mengxr 379e758
[SPARK-3035] Wrong example with SparkContext.addFile
iAmGhost 2fc8aca
[SPARK-1065] [PySpark] improve supporting for large broadcast
davies bc95fe0
In the stop method of ConnectionManager to cancel the ackTimeoutMonitor
witgo fbad722
[SPARK-3077][MLLIB] fix some chisq-test
mengxr 73ab7f1
[SPARK-3042] [mllib] DecisionTree Filter top-down instead of bottom-up
jkbradley 318e28b
SPARK-2881. Upgrade snappy-java to 1.1.1.3.
pwendell 5ecb08e
Revert "[SPARK-2970] [SQL] spark-sql script ends with IOException whe…
marmbrus bfa09b0
[SQL] Improve debug logging and toStrings.
marmbrus 9924328
[SPARK-1981] updated streaming-kinesis.md
cfregly 95470a0
[HOTFIX][STREAMING] Allow the JVM/Netty to decide which port to bind …
harishreedharan c77f406
[SPARK-3087][MLLIB] fix col indexing bug in chi-square and add a chec…
mengxr 5173f3c
SPARK-2884: Create binary builds in parallel with release script.
pwendell df652ea
SPARK-2900. aggregate inputBytes per stage
sryza 3c8fa50
[SPARK-3097][MLlib] Word2Vec performance improvement
Ishiihara eef779b
[SPARK-2842][MLlib]Word2Vec documentation
Ishiihara d8b593b
add comments
giwa e7ebb08
removed wasted print in DStream
giwa 636090a
added sparkContext as input parameter in StreamingContext
giwa a3d2379
added gorupByKey testcase
giwa 665bfdb
added testcase for combineByKey
giwa 5c3a683
initial commit for pySparkStreaming
giwa e497b9b
comment PythonDStream.PairwiseDStream
6e0d9c7
modify dstream.py to fix indent error
9af03f4
added reducedByKey not working yet
dcf243f
implementing transform function in Python
c5518b4
modified the code base on comment in https://github.com/tdas/spark/pu…
3758175
add coment for hack why PYSPARK_PYTHON is needed in spark-submit
e551e13
add coment for hack why PYSPARK_PYTHON is needed in spark-submit
2adca84
remove not implemented DStream functions in python
5594bd4
revert pom.xml
490e338
sorted the import following Spark coding convention
856d98e
add empty line
4ce4058
remove unused import in python
02f618a
initial commit for socketTextStream
4b69fb1
fied input of socketTextDStream
57fb740
added doctest for pyspark.streaming.duration
967dc26
fixed typo of network_workdcount.py
7f7c5d1
delete old file
d25d5cf
added reducedByKey not working yet
0b8b7d0
reduceByKey is working
d1ee6ca
edit python sparkstreaming example
a9f4ecb
added count operation but this implementation need double check
05459c6
fix map function
9fa249b
clean up code
aeaf8a5
clean up codes
5e822d4
remove waste file
4eff053
Implemented DStream.foreachRDD in the Python API using Py4J callback …
tdas 4caae3f
Added missing file
tdas c9fc124
Added extra line.
tdas 19ddcdd
tried to restart callback server
b47b5fd
Kill py4j callback server properly
b6468e6
Removed the waste line
giwa b8d7d24
implemented reduce and count function in Dstream
giwa 189dcea
clean up examples
giwa 79c5809
added stop in StreamingContext
giwa 5a9b525
clean up dstream.py
giwa ea4b06b
initial commit for testcase
giwa 5d22c92
WIP
giwa c880a33
update comment
giwa 1fd12ae
WIP
giwa c05922c
WIP: added PythonTestInputStream
giwa 1f68b78
WIP
giwa 3dda31a
WIP added test case
giwa 7f96294
added basic operation test cases
giwa fa75d71
delete waste file
giwa 8efa266
fixed PEP-008 violation
giwa 3a671cc
remove export PYSPARK_PYTHON in spark submit
giwa 774f18d
removed unnesessary changes
giwa 33c0f94
edited the comment to add more precise description
giwa 4f2d7e6
added mapValues and flatMapVaules WIP for glom and mapPartitions test
giwa 9767712
WIP: solved partitioned and None is not recognized
giwa 35933e1
broke something
giwa 7051a84
all tests are passed if numSlice is 2 and the numver of each input is…
giwa 99e4bb3
basic function test cases are passed
giwa 580fbc2
modified streaming test case to add coment
giwa 94f2b65
remove waste duplicated code
giwa e9fab72
added saveAsTextFiles and saveAsPickledFiles
giwa 4aa99e4
added TODO coments
giwa 6d8190a
add comments
giwa 14d4c0e
removed wasted print in DStream
giwa 97742fe
added sparkContext as input parameter in StreamingContext
giwa e162822
added gorupByKey testcase
giwa e70f706
added testcase for combineByKey
giwa f1798c4
merge with master
giwa 185fdbf
merge with master
giwa 199e37f
adopted the latest compression way of python command
giwa 58150f5
Changed the test case to focus the test operation
giwa 09a28bf
improve testcases
giwa 268a6a5
Changed awaitTermination not to call awaitTermincation in Scala. Just…
giwa 4dedd2d
change test case not to use awaitTermination
giwa 171edeb
clean up
giwa f0ea311
clean up code
giwa 1d84142
remove unimplement test
giwa 583e66d
move tests for streaming inside streaming directory
giwa b7dab85
improve test case
giwa 0d30109
fixed pep8 violation
giwa 24f95db
clen up examples
giwa 9c85e48
clean up exmples
giwa 7339df2
fixed typo
giwa 9d1de23
revert pom.xml
giwa 4f82c89
remove duplicated import
giwa 50fd6f9
revert pom.xml
giwa 93f7637
fixed explanaiton
giwa acfcaeb
revert pom.xml
giwa 3b27bd4
remove the last brank line
giwa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
comment PythonDStream.PairwiseDStream
- Loading branch information
commit e497b9bfe6ba96db46122aa369b5dba528524c2e
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend to keep this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I moved this code to reduceByKey branch to experiment for reduce operation