Skip to content
Closed
Changes from 1 commit
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
2e0b308
initial commit of cogroup
d80tb7 Jun 20, 2019
64ff5ac
minor tidy up
d80tb7 Jun 20, 2019
6d039e3
removed incorrect test
d80tb7 Jun 21, 2019
d8a5c5d
tidies up test, fixed output cols
d80tb7 Jun 25, 2019
73188f6
removed incorrect file
d80tb7 Jun 25, 2019
690fa14
Revert: removed incorrect test
d80tb7 Jun 25, 2019
c86b2bf
Merge branch 'master' of https://github.com/d80tb7/spark into SPARK-2…
d80tb7 Jun 25, 2019
e3b66ac
fix for resolving key cols
d80tb7 Jun 25, 2019
8007fa6
common trait for grouped mandas udfs
d80tb7 Jun 27, 2019
d4cf6d0
poc using arrow streams
d80tb7 Jun 27, 2019
87aeb92
more unit tests fro cogroup
d80tb7 Jun 27, 2019
e7528d0
argspec includes grouping key
d80tb7 Jul 2, 2019
b85ec75
fixed tests und
d80tb7 Jul 2, 2019
6a8ecff
keys now handled properly. Validation of udf. More tests
d80tb7 Jul 2, 2019
d2da787
formatting
d80tb7 Jul 2, 2019
7321141
fixed scalastyle errors
d80tb7 Jul 2, 2019
6bbe31c
updated grouped map to new args format
d80tb7 Jul 2, 2019
b444ff7
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Jul 2, 2019
94be574
some code review fixes
d80tb7 Jul 11, 2019
9241639
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Jul 11, 2019
3de551f
more code review fixes
d80tb7 Jul 11, 2019
300b53a
more code review fixes
d80tb7 Jul 11, 2019
7d161ba
fix comment on PandasCogroupSerializer
d80tb7 Jul 11, 2019
d1a6366
formatting
d80tb7 Jul 11, 2019
a201161
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Jul 19, 2019
3e4bc95
python style fixes
d80tb7 Jul 19, 2019
307e664
added doc
d80tb7 Jul 19, 2019
7558b8d
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Jul 23, 2019
19360c4
minor formatting
d80tb7 Jul 23, 2019
28493b4
a couple more usnit tests
d80tb7 Jul 23, 2019
d6d11e4
minor formatting
d80tb7 Jul 23, 2019
a62a1e3
more doc
d80tb7 Jul 25, 2019
ec78284
added comment to cogroup func
d80tb7 Jul 25, 2019
1a9ff58
fixed python style
d80tb7 Jul 25, 2019
c0d2919
review comments
d80tb7 Aug 20, 2019
4cd5c70
review comments scala
d80tb7 Aug 20, 2019
e025375
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Aug 20, 2019
dd1ffaf
python formatting
d80tb7 Aug 20, 2019
733b592
review comments (mainly formatting)
d80tb7 Sep 8, 2019
51dcbdc
Merge branch 'master' of https://github.com/apache/spark into SPARK-2…
d80tb7 Sep 8, 2019
1b966fd
couple more format changes
d80tb7 Sep 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
updated grouped map to new args format
  • Loading branch information
d80tb7 committed Jul 2, 2019
commit 6bbe31c004b755eb0b86730c5b6002c14c351ec0
6 changes: 3 additions & 3 deletions python/pyspark/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -372,9 +372,9 @@ def map_batch(batch):
arg_offsets, udf = read_single_udf(
pickleSer, infile, eval_type, runner_conf, udf_index=0)
udfs['f'] = udf
split_offset = arg_offsets[0] + 1
arg0 = ["a[%d]" % o for o in arg_offsets[1: split_offset]]
arg1 = ["a[%d]" % o for o in arg_offsets[split_offset:]]
parsed_offsets = parse_grouped_arg_offsets(arg_offsets)
arg0 = ["a[%d]" % o for o in parsed_offsets[0][0]]
arg1 = ["a[%d]" % o for o in parsed_offsets[0][1]]
mapper_str = "lambda a: f([%s], [%s])" % (", ".join(arg0), ", ".join(arg1))
elif eval_type == PythonEvalType.SQL_COGROUPED_MAP_PANDAS_UDF:
# We assume there is only one UDF here because cogrouped map doesn't
Expand Down