Skip to content
Closed
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
512958b
Basework
HeartSaVioR Jul 12, 2022
d36373b
Add Python implementation
HyukjinKwon Jul 14, 2022
f754fd9
Reorder key attributes from deduplicated data attributes
HyukjinKwon Jul 27, 2022
5194e0c
Apply suggestions from code review
HyukjinKwon Jul 27, 2022
1301ee5
Refactoring a bit to respect the column order
HyukjinKwon Aug 11, 2022
135a826
WIP Changes to execute in pipelined manner
HeartSaVioR Aug 15, 2022
9282e5c
WIP further optimization
HeartSaVioR Aug 18, 2022
a792c98
WIP comments for more tunes
HeartSaVioR Aug 18, 2022
27e7af9
WIP further tune...
HeartSaVioR Aug 18, 2022
04a6b98
WIP done more tune! didn't do any of pandas/arrow side tunes
HeartSaVioR Aug 18, 2022
765f4d3
WIP avoid adding additional empty row for state, empty row will be ad…
HeartSaVioR Aug 19, 2022
9e11225
WIP remove debug log
HeartSaVioR Aug 19, 2022
f33d978
WIP hack around to see the possibility of perf gain on binpacking
HeartSaVioR Aug 27, 2022
8604fdf
WIP proper work to apply binpacking on python worker -> executor
HeartSaVioR Aug 27, 2022
0d024e0
WIP fix silly bug
HeartSaVioR Aug 27, 2022
43c623b
WIP another silly bugfix on migration
HeartSaVioR Aug 27, 2022
af1725a
WIP apply binpacking for executor -> python worker as well
HeartSaVioR Aug 27, 2022
31e9687
WIP fix silly bug
HeartSaVioR Aug 27, 2022
cad77a2
WIP fix another silly bug
HeartSaVioR Aug 27, 2022
c3da996
WIP batching per specified size, with sampling
HeartSaVioR Aug 29, 2022
cfb2780
WIP introduce DBR-only change
HeartSaVioR Aug 29, 2022
228b140
WIP debugging now...
HeartSaVioR Aug 29, 2022
ee4ed57
WIP still debugging... weirdness happened
HeartSaVioR Aug 30, 2022
4045ab3
WIP small fix
HeartSaVioR Aug 30, 2022
2d115ab
WIP fix a serious bug... make sure all columns in Arrow RecordBatch h…
HeartSaVioR Aug 30, 2022
3e7d785
WIP strengthen test
HeartSaVioR Aug 30, 2022
029dae7
WIP documenting the changes for pipelining and bin-packing... not yet…
HeartSaVioR Sep 2, 2022
d7ecaf9
WIP sync
HeartSaVioR Sep 2, 2022
6a6dd20
WIP start with is_last_chunk since it's easier to implement... severa…
HeartSaVioR Sep 2, 2022
5cfd59c
WIP adjust the test code to make test pass with multiple calls
HeartSaVioR Sep 2, 2022
63f8f87
WIP refactor a bit... just extract the abstract classes to explicit ones
HeartSaVioR Sep 5, 2022
6e772cd
WIP iterator of DatFrame done! updated tests and they all passed
HeartSaVioR Sep 5, 2022
00836b5
WIP FIX pyspark side test failure
HeartSaVioR Sep 6, 2022
5fdde94
WIP sort out codebase a bit
HeartSaVioR Sep 14, 2022
e7ad043
WIP no batch query support in applyInPandasWithState
HeartSaVioR Sep 6, 2022
5070b81
WIP address some missed things
HeartSaVioR Sep 6, 2022
1b919b8
WIP remove comments which are obsolete or won't be addressed
HeartSaVioR Sep 7, 2022
198fc17
WIP change the return type of user function to Iterator[DataFrame]
HeartSaVioR Sep 7, 2022
f2a75f1
WIP remove unnecessary interface/implementation changes on GroupState…
HeartSaVioR Sep 13, 2022
3e5f5d4
WIP refine out some code
HeartSaVioR Sep 13, 2022
4e34d29
WIP fix scalastyle
HeartSaVioR Sep 13, 2022
50e743e
WIP remove obsolete class
HeartSaVioR Sep 13, 2022
d22d7db
WIP remove the temp fix
HeartSaVioR Sep 13, 2022
e60408f
remove unused code
HeartSaVioR Sep 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
WIP sync
  • Loading branch information
HeartSaVioR committed Sep 14, 2022
commit d7ecaf944774a2d418863e4526a3ee327d4b807a
Original file line number Diff line number Diff line change
Expand Up @@ -77,17 +77,7 @@ class ArrowPythonRunnerWithState(
"Pandas execution requires more than 4 bytes. Please set higher buffer. " +
s"Please change '${SQLConf.PANDAS_UDF_BUFFER_SIZE.key}'.")

private val stateMetadataSchema = StructType(
Array(
StructField("properties", StringType),
StructField("keyRowAsUnsafe", BinaryType),
StructField("object", BinaryType),
StructField("startOffset", IntegerType),
StructField("numRows", IntegerType)
)
)

private val schemaWithState = inputSchema.add("!__state__!", stateMetadataSchema)
private val schemaWithState = inputSchema.add("!__state__!", STATE_METADATA_SCHEMA)

private val stateRowDeserializer = stateEncoder.createDeserializer()

Expand Down