Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
2e0b308
initial commit of cogroup
d80tb7 Jun 20, 2019
64ff5ac
minor tidy up
d80tb7 Jun 20, 2019
6d039e3
removed incorrect test
d80tb7 Jun 21, 2019
d8a5c5d
tidies up test, fixed output cols
d80tb7 Jun 25, 2019
73188f6
removed incorrect file
d80tb7 Jun 25, 2019
690fa14
Revert: removed incorrect test
d80tb7 Jun 25, 2019
c86b2bf
Merge branch 'master' of https://github.com/d80tb7/spark into SPARK-2…
d80tb7 Jun 25, 2019
e3b66ac
fix for resolving key cols
d80tb7 Jun 25, 2019
8007fa6
common trait for grouped mandas udfs
d80tb7 Jun 27, 2019
d4cf6d0
poc using arrow streams
d80tb7 Jun 27, 2019
87aeb92
more unit tests fro cogroup
d80tb7 Jun 27, 2019
e7528d0
argspec includes grouping key
d80tb7 Jul 2, 2019
b85ec75
fixed tests und
d80tb7 Jul 2, 2019
6a8ecff
keys now handled properly. Validation of udf. More tests
d80tb7 Jul 2, 2019
d2da787
formatting
d80tb7 Jul 2, 2019
7321141
fixed scalastyle errors
d80tb7 Jul 2, 2019
6bbe31c
updated grouped map to new args format
d80tb7 Jul 2, 2019
b444ff7
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Jul 2, 2019
94be574
some code review fixes
d80tb7 Jul 11, 2019
9241639
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Jul 11, 2019
3de551f
more code review fixes
d80tb7 Jul 11, 2019
300b53a
more code review fixes
d80tb7 Jul 11, 2019
7d161ba
fix comment on PandasCogroupSerializer
d80tb7 Jul 11, 2019
d1a6366
formatting
d80tb7 Jul 11, 2019
a201161
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Jul 19, 2019
3e4bc95
python style fixes
d80tb7 Jul 19, 2019
307e664
added doc
d80tb7 Jul 19, 2019
7558b8d
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Jul 23, 2019
19360c4
minor formatting
d80tb7 Jul 23, 2019
28493b4
a couple more usnit tests
d80tb7 Jul 23, 2019
d6d11e4
minor formatting
d80tb7 Jul 23, 2019
a62a1e3
more doc
d80tb7 Jul 25, 2019
ec78284
added comment to cogroup func
d80tb7 Jul 25, 2019
1a9ff58
fixed python style
d80tb7 Jul 25, 2019
c0d2919
review comments
d80tb7 Aug 20, 2019
4cd5c70
review comments scala
d80tb7 Aug 20, 2019
e025375
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Aug 20, 2019
dd1ffaf
python formatting
d80tb7 Aug 20, 2019
733b592
review comments (mainly formatting)
d80tb7 Sep 8, 2019
51dcbdc
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Sep 8, 2019
1b966fd
couple more format changes
d80tb7 Sep 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fixed tests und
  • Loading branch information
d80tb7 committed Jul 2, 2019
commit b85ec750e288eec48660e03c0f6db26f4efef67a
3 changes: 3 additions & 0 deletions python/pyspark/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,9 @@ def __next__(self):
else:
raise ValueError('Received invalid stream status {0}'.format(stream_status))

def next(self):
return self.__next__()

def _read_df(self):
import pyarrow as pa
reader = pa.ipc.open_stream(self._stream)
Expand Down
17 changes: 17 additions & 0 deletions python/pyspark/sql/cogroup.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we don't generate documentation for this:

Screen Shot 2019-09-22 at 9 41 48 PM

cannot click.

It should be either documented at python/docs/pyspark.sql.rst or imported at pyspark/sql/__init__.py with including it at __all__.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 adding it to pyspark/sql/__init__.py with including it at __all__ since this is what group.py does

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from pyspark.sql.dataframe import DataFrame


Expand Down
10 changes: 10 additions & 0 deletions python/pyspark/sql/tests/test_pandas_udf_cogrouped_map.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,13 @@ def merge_pandas(l, r):

assert_frame_equal(expected, result, check_column_type=_check_column_type)


if __name__ == "__main__":
from pyspark.sql.tests.test_pandas_udf_cogrouped_map import *

try:
import xmlrunner
testRunner = xmlrunner.XMLTestRunner(output='target/test-reports', verbosity=2)
except ImportError:
testRunner = None
unittest.main(testRunner=testRunner, verbosity=2)