Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
3d567a3
[MINOR][SQL] Avoid unnecessary invocation on checkAndGlobPathIfNecessary
Ngone51 Oct 22, 2019
484f93e
[SPARK-29530][SQL] Make SQLConf in SQL parse process thread safe
AngersZhuuuu Oct 22, 2019
467c3f6
[SPARK-29529][DOCS] Remove unnecessary orc version and hive version i…
denglingang Oct 22, 2019
811d563
[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8
HyukjinKwon Oct 22, 2019
868d851
[SPARK-29232][ML] Update the parameter maps of the DecisionTreeRegres…
huaxingao Oct 22, 2019
3163b6b
[SPARK-29516][SQL][TEST] Test ThriftServerQueryTestSuite asynchronously
wangyum Oct 22, 2019
bb49c80
[SPARK-21492][SQL] Fix memory leak in SortMergeJoin
xuanyuanking Oct 22, 2019
b4844ee
[SPARK-29517][SQL] TRUNCATE TABLE should look up catalog/table like v…
viirya Oct 22, 2019
8779938
[SPARK-28787][DOC][SQL] Document LOAD DATA statement in SQL Reference
huaxingao Oct 22, 2019
c1c6485
[SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
dilipbiswal Oct 22, 2019
2036a8c
[SPARK-29488][WEBUI] In Web UI, stage page has js error when sort table
jennyinspur Oct 22, 2019
8009468
[SPARK-29556][CORE] Avoid putting request path in error response in E…
srowen Oct 22, 2019
3bf5355
[SPARK-29539][SQL] SHOW PARTITIONS should look up catalog/table like …
huaxingao Oct 22, 2019
f23c5d7
[SPARK-29560][BUILD] Add typesafe bintray repo for sbt-mima-plugin
dongjoon-hyun Oct 22, 2019
e674909
[SPARK-29107][SQL][TESTS] Port window.sql (Part 1)
DylanGuedes Oct 23, 2019
c128ac5
[SPARK-29511][SQL] DataSourceV2: Support CREATE NAMESPACE
imback82 Oct 23, 2019
8c34690
[SPARK-29546][TESTS] Recover jersey-guava test dependency in docker-i…
dongjoon-hyun Oct 23, 2019
cbe6ead
[SPARK-29352][SQL][SS] Track active streaming queries in the SparkSes…
brkyvz Oct 23, 2019
70dd9c0
[SPARK-29542][SQL][DOC] Make the descriptions of spark.sql.files.* be…
turboFei Oct 23, 2019
0a70951
[SPARK-29499][CORE][PYSPARK] Add mapPartitionsWithIndex for RDDBarrier
ConeyLiu Oct 23, 2019
df00b5c
[SPARK-29569][BUILD][DOCS] Copy and paste minified jquery instead whe…
HyukjinKwon Oct 23, 2019
53a5f17
[SPARK-29513][SQL] REFRESH TABLE should look up catalog/table like v2…
imback82 Oct 23, 2019
bfbf282
[SPARK-29503][SQL] Remove conversion CreateNamedStruct to CreateNamed…
HeartSaVioR Oct 23, 2019
7e8e4c0
[SPARK-29552][SQL] Execute the "OptimizeLocalShuffleReader" rule when…
JkSelf Oct 23, 2019
5867707
[SPARK-29557][BUILD] Update dropwizard/codahale metrics library to 3.2.6
LucaCanali Oct 23, 2019
b91356e
[SPARK-29533][SQL][TESTS][FOLLOWUP] Regenerate the result on EC2
dongjoon-hyun Oct 23, 2019
7ecf968
[SPARK-29567][TESTS] Update JDBC Integration Test Docker Images
dongjoon-hyun Oct 23, 2019
fd899d6
[SPARK-29576][CORE] Use Spark's CompressionCodec for Ser/Deser of Map…
dbtsai Oct 24, 2019
55ced9c
[SPARK-29571][SQL][TESTS][FOLLOWUP] Fix UT in AllExecutionsPageSuite
07ARB Oct 24, 2019
177bf67
[SPARK-29522][SQL] CACHE TABLE should look up catalog/table like v2 c…
viirya Oct 24, 2019
9e77d48
[SPARK-21492][SQL][FOLLOW UP] Reimplement UnsafeExternalRowSorter in …
xuanyuanking Oct 24, 2019
1296bbb
[SPARK-29504][WEBUI] Toggle full job description on click
PavithraRamachandran Oct 24, 2019
67cf043
[SPARK-29145][SQL] Support sub-queries in join conditions
AngersZhuuuu Oct 24, 2019
1ec1b2b
[SPARK-28791][DOC] Documentation for Alter table Command
PavithraRamachandran Oct 24, 2019
76d4beb
[SPARK-29559][WEBUI] Support pagination for JDBC/ODBC Server page
shahidki31 Oct 24, 2019
a35fb4f
[SPARK-29578][TESTS] Add "8634" as another skipped day for Kwajalein …
srowen Oct 24, 2019
cdea520
[SPARK-29532][SQL] Simplify interval string parsing
cloud-fan Oct 24, 2019
dcf5eaf
[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFi…
Oct 24, 2019
92b2529
[SPARK-21287][SQL] Remove requirement of fetch_size>=0 from JDBCOptions
fuwhu Oct 24, 2019
dec99d8
[SPARK-29526][SQL] UNCACHE TABLE should look up catalog/table like v2…
imback82 Oct 24, 2019
40df9d2
[SPARK-29227][SS] Track rule info in optimization phase
wenxuanguan Oct 25, 2019
7417c3e
[SPARK-29597][DOCS] Deprecate old Java 8 versions prior to 8u92
dongjoon-hyun Oct 25, 2019
1474ed0
[SPARK-29562][SQL] Speed up and slim down metric aggregation in SQL l…
Oct 25, 2019
091cbc3
[SPARK-9612][ML] Add instance weight support for GBTs
zhengruifeng Oct 25, 2019
cfbdd9d
[SPARK-29461][SQL] Measure the number of records being updated for JD…
HeartSaVioR Oct 25, 2019
8bd8f49
[SPARK-29500][SQL][SS] Support partition column when writing to Kafka
redsk Oct 25, 2019
0cf4f07
[SPARK-29545][SQL] Add support for bit_xor aggregate function
yaooqinn Oct 25, 2019
68dca9a
[SPARK-29527][SQL] SHOW CREATE TABLE should look up catalog/table lik…
viirya Oct 25, 2019
ae5b60d
[SPARK-29182][CORE][FOLLOWUP] Cache preferred locations of checkpoint…
viirya Oct 25, 2019
2baf7a1
[SPARK-29608][BUILD] Add `hadoop-3.2` profile to release build
dongjoon-hyun Oct 25, 2019
2549391
[SPARK-29580][TESTS] Add kerberos debug messages for Kafka secure tests
gaborgsomogyi Oct 25, 2019
5bdc58b
[SPARK-27653][SQL][FOLLOWUP] Fix `since` version of `min_by/max_by`
dongjoon-hyun Oct 26, 2019
9a46702
[SPARK-29554][SQL] Add `version` SQL function
yaooqinn Oct 26, 2019
2115bf6
[SPARK-29490][SQL] Reset 'WritableColumnVector' in 'RowToColumnarExec'
marin-ma Oct 26, 2019
077fb99
[SPARK-29589][WEBUI] Support pagination for sqlstats session table in…
shahidki31 Oct 26, 2019
74514b4
[SPARK-29614][SQL][TEST] Fix failures of DateTimeUtilsSuite and Times…
MaxGekk Oct 27, 2019
a43b966
[SPARK-29613][BUILD][SS] Upgrade to Kafka 2.3.1
dongjoon-hyun Oct 27, 2019
b19fd48
[SPARK-29093][PYTHON][ML] Remove automatically generated param setter…
huaxingao Oct 28, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8
### What changes were proposed in this pull request?

Inline cloudpickle in PySpark to cloudpickle 1.1.1. See https://github.com/cloudpipe/cloudpickle/blob/v1.1.1/cloudpickle/cloudpickle.py

cloudpipe/cloudpickle#269 was added for Python 3.8 support (fixed from 1.1.0). Using 1.2.2 seems breaking PyPy 2 due to cloudpipe/cloudpickle#278 so this PR currently uses 1.1.1.

Once we drop Python 2, we can switch to the highest version.

### Why are the changes needed?

positional-only arguments was newly introduced from Python 3.8 (see https://docs.python.org/3/whatsnew/3.8.html#positional-only-parameters)

Particularly the newly added argument to `types.CodeType` was the problem (https://docs.python.org/3/whatsnew/3.8.html#changes-in-the-python-api):

> `types.CodeType` has a new parameter in the second position of the constructor (posonlyargcount) to support positional-only arguments defined in **PEP 570**. The first argument (argcount) now represents the total number of positional arguments (including positional-only arguments). The new `replace()` method of `types.CodeType` can be used to make the code future-proof.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually tested. Note that the optional dependency PyArrow looks not yet supporting Python 3.8; therefore, it was not tested. See "Details" below.

<details>
<p>

```bash
cd python
./run-tests --python-executables=python3.8
```

```
Running PySpark tests. Output is in /Users/hyukjin.kwon/workspace/forked/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.8']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Starting test(python3.8): pyspark.ml.tests.test_algorithms
Starting test(python3.8): pyspark.ml.tests.test_feature
Starting test(python3.8): pyspark.ml.tests.test_base
Starting test(python3.8): pyspark.ml.tests.test_evaluation
Finished test(python3.8): pyspark.ml.tests.test_base (12s)
Starting test(python3.8): pyspark.ml.tests.test_image
Finished test(python3.8): pyspark.ml.tests.test_evaluation (14s)
Starting test(python3.8): pyspark.ml.tests.test_linalg
Finished test(python3.8): pyspark.ml.tests.test_feature (23s)
Starting test(python3.8): pyspark.ml.tests.test_param
Finished test(python3.8): pyspark.ml.tests.test_image (22s)
Starting test(python3.8): pyspark.ml.tests.test_persistence
Finished test(python3.8): pyspark.ml.tests.test_param (25s)
Starting test(python3.8): pyspark.ml.tests.test_pipeline
Finished test(python3.8): pyspark.ml.tests.test_linalg (37s)
Starting test(python3.8): pyspark.ml.tests.test_stat
Finished test(python3.8): pyspark.ml.tests.test_pipeline (7s)
Starting test(python3.8): pyspark.ml.tests.test_training_summary
Finished test(python3.8): pyspark.ml.tests.test_stat (21s)
Starting test(python3.8): pyspark.ml.tests.test_tuning
Finished test(python3.8): pyspark.ml.tests.test_persistence (45s)
Starting test(python3.8): pyspark.ml.tests.test_wrapper
Finished test(python3.8): pyspark.ml.tests.test_algorithms (83s)
Starting test(python3.8): pyspark.mllib.tests.test_algorithms
Finished test(python3.8): pyspark.ml.tests.test_training_summary (32s)
Starting test(python3.8): pyspark.mllib.tests.test_feature
Finished test(python3.8): pyspark.ml.tests.test_wrapper (20s)
Starting test(python3.8): pyspark.mllib.tests.test_linalg
Finished test(python3.8): pyspark.mllib.tests.test_feature (32s)
Starting test(python3.8): pyspark.mllib.tests.test_stat
Finished test(python3.8): pyspark.mllib.tests.test_algorithms (70s)
Starting test(python3.8): pyspark.mllib.tests.test_streaming_algorithms
Finished test(python3.8): pyspark.mllib.tests.test_stat (37s)
Starting test(python3.8): pyspark.mllib.tests.test_util
Finished test(python3.8): pyspark.mllib.tests.test_linalg (70s)
Starting test(python3.8): pyspark.sql.tests.test_arrow
Finished test(python3.8): pyspark.sql.tests.test_arrow (1s) ... 53 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_catalog
Finished test(python3.8): pyspark.mllib.tests.test_util (15s)
Starting test(python3.8): pyspark.sql.tests.test_column
Finished test(python3.8): pyspark.sql.tests.test_catalog (24s)
Starting test(python3.8): pyspark.sql.tests.test_conf
Finished test(python3.8): pyspark.sql.tests.test_column (21s)
Starting test(python3.8): pyspark.sql.tests.test_context
Finished test(python3.8): pyspark.ml.tests.test_tuning (125s)
Starting test(python3.8): pyspark.sql.tests.test_dataframe
Finished test(python3.8): pyspark.sql.tests.test_conf (9s)
Starting test(python3.8): pyspark.sql.tests.test_datasources
Finished test(python3.8): pyspark.sql.tests.test_context (29s)
Starting test(python3.8): pyspark.sql.tests.test_functions
Finished test(python3.8): pyspark.sql.tests.test_datasources (32s)
Starting test(python3.8): pyspark.sql.tests.test_group
Finished test(python3.8): pyspark.sql.tests.test_dataframe (39s) ... 3 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf (1s) ... 6 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_cogrouped_map
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_cogrouped_map (0s) ... 14 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_grouped_agg
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_grouped_agg (1s) ... 15 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_grouped_map
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_grouped_map (1s) ... 20 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_scalar
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_scalar (1s) ... 49 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_window
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_window (1s) ... 14 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_readwriter
Finished test(python3.8): pyspark.sql.tests.test_functions (29s)
Starting test(python3.8): pyspark.sql.tests.test_serde
Finished test(python3.8): pyspark.sql.tests.test_group (20s)
Starting test(python3.8): pyspark.sql.tests.test_session
Finished test(python3.8): pyspark.mllib.tests.test_streaming_algorithms (126s)
Starting test(python3.8): pyspark.sql.tests.test_streaming
Finished test(python3.8): pyspark.sql.tests.test_serde (25s)
Starting test(python3.8): pyspark.sql.tests.test_types
Finished test(python3.8): pyspark.sql.tests.test_readwriter (38s)
Starting test(python3.8): pyspark.sql.tests.test_udf
Finished test(python3.8): pyspark.sql.tests.test_session (32s)
Starting test(python3.8): pyspark.sql.tests.test_utils
Finished test(python3.8): pyspark.sql.tests.test_utils (17s)
Starting test(python3.8): pyspark.streaming.tests.test_context
Finished test(python3.8): pyspark.sql.tests.test_types (45s)
Starting test(python3.8): pyspark.streaming.tests.test_dstream
Finished test(python3.8): pyspark.sql.tests.test_udf (44s)
Starting test(python3.8): pyspark.streaming.tests.test_kinesis
Finished test(python3.8): pyspark.streaming.tests.test_kinesis (0s) ... 2 tests were skipped
Starting test(python3.8): pyspark.streaming.tests.test_listener
Finished test(python3.8): pyspark.streaming.tests.test_context (28s)
Starting test(python3.8): pyspark.tests.test_appsubmit
Finished test(python3.8): pyspark.sql.tests.test_streaming (60s)
Starting test(python3.8): pyspark.tests.test_broadcast
Finished test(python3.8): pyspark.streaming.tests.test_listener (11s)
Starting test(python3.8): pyspark.tests.test_conf
Finished test(python3.8): pyspark.tests.test_conf (17s)
Starting test(python3.8): pyspark.tests.test_context
Finished test(python3.8): pyspark.tests.test_broadcast (39s)
Starting test(python3.8): pyspark.tests.test_daemon
Finished test(python3.8): pyspark.tests.test_daemon (5s)
Starting test(python3.8): pyspark.tests.test_join
Finished test(python3.8): pyspark.tests.test_context (31s)
Starting test(python3.8): pyspark.tests.test_profiler
Finished test(python3.8): pyspark.tests.test_join (9s)
Starting test(python3.8): pyspark.tests.test_rdd
Finished test(python3.8): pyspark.tests.test_profiler (12s)
Starting test(python3.8): pyspark.tests.test_readwrite
Finished test(python3.8): pyspark.tests.test_readwrite (23s) ... 3 tests were skipped
Starting test(python3.8): pyspark.tests.test_serializers
Finished test(python3.8): pyspark.tests.test_appsubmit (94s)
Starting test(python3.8): pyspark.tests.test_shuffle
Finished test(python3.8): pyspark.streaming.tests.test_dstream (110s)
Starting test(python3.8): pyspark.tests.test_taskcontext
Finished test(python3.8): pyspark.tests.test_rdd (42s)
Starting test(python3.8): pyspark.tests.test_util
Finished test(python3.8): pyspark.tests.test_serializers (11s)
Starting test(python3.8): pyspark.tests.test_worker
Finished test(python3.8): pyspark.tests.test_shuffle (12s)
Starting test(python3.8): pyspark.accumulators
Finished test(python3.8): pyspark.tests.test_util (7s)
Starting test(python3.8): pyspark.broadcast
Finished test(python3.8): pyspark.accumulators (8s)
Starting test(python3.8): pyspark.conf
Finished test(python3.8): pyspark.broadcast (8s)
Starting test(python3.8): pyspark.context
Finished test(python3.8): pyspark.tests.test_worker (19s)
Starting test(python3.8): pyspark.ml.classification
Finished test(python3.8): pyspark.conf (4s)
Starting test(python3.8): pyspark.ml.clustering
Finished test(python3.8): pyspark.context (22s)
Starting test(python3.8): pyspark.ml.evaluation
Finished test(python3.8): pyspark.tests.test_taskcontext (49s)
Starting test(python3.8): pyspark.ml.feature
Finished test(python3.8): pyspark.ml.clustering (43s)
Starting test(python3.8): pyspark.ml.fpm
Finished test(python3.8): pyspark.ml.evaluation (27s)
Starting test(python3.8): pyspark.ml.image
Finished test(python3.8): pyspark.ml.image (8s)
Starting test(python3.8): pyspark.ml.linalg.__init__
Finished test(python3.8): pyspark.ml.linalg.__init__ (0s)
Starting test(python3.8): pyspark.ml.recommendation
Finished test(python3.8): pyspark.ml.classification (63s)
Starting test(python3.8): pyspark.ml.regression
Finished test(python3.8): pyspark.ml.fpm (23s)
Starting test(python3.8): pyspark.ml.stat
Finished test(python3.8): pyspark.ml.stat (30s)
Starting test(python3.8): pyspark.ml.tuning
Finished test(python3.8): pyspark.ml.regression (51s)
Starting test(python3.8): pyspark.mllib.classification
Finished test(python3.8): pyspark.ml.feature (93s)
Starting test(python3.8): pyspark.mllib.clustering
Finished test(python3.8): pyspark.ml.tuning (39s)
Starting test(python3.8): pyspark.mllib.evaluation
Finished test(python3.8): pyspark.mllib.classification (38s)
Starting test(python3.8): pyspark.mllib.feature
Finished test(python3.8): pyspark.mllib.evaluation (25s)
Starting test(python3.8): pyspark.mllib.fpm
Finished test(python3.8): pyspark.mllib.clustering (64s)
Starting test(python3.8): pyspark.mllib.linalg.__init__
Finished test(python3.8): pyspark.ml.recommendation (131s)
Starting test(python3.8): pyspark.mllib.linalg.distributed
Finished test(python3.8): pyspark.mllib.linalg.__init__ (0s)
Starting test(python3.8): pyspark.mllib.random
Finished test(python3.8): pyspark.mllib.feature (36s)
Starting test(python3.8): pyspark.mllib.recommendation
Finished test(python3.8): pyspark.mllib.fpm (31s)
Starting test(python3.8): pyspark.mllib.regression
Finished test(python3.8): pyspark.mllib.random (16s)
Starting test(python3.8): pyspark.mllib.stat.KernelDensity
Finished test(python3.8): pyspark.mllib.stat.KernelDensity (1s)
Starting test(python3.8): pyspark.mllib.stat._statistics
Finished test(python3.8): pyspark.mllib.stat._statistics (25s)
Starting test(python3.8): pyspark.mllib.tree
Finished test(python3.8): pyspark.mllib.regression (44s)
Starting test(python3.8): pyspark.mllib.util
Finished test(python3.8): pyspark.mllib.recommendation (49s)
Starting test(python3.8): pyspark.profiler
Finished test(python3.8): pyspark.mllib.linalg.distributed (53s)
Starting test(python3.8): pyspark.rdd
Finished test(python3.8): pyspark.profiler (14s)
Starting test(python3.8): pyspark.serializers
Finished test(python3.8): pyspark.mllib.tree (30s)
Starting test(python3.8): pyspark.shuffle
Finished test(python3.8): pyspark.shuffle (2s)
Starting test(python3.8): pyspark.sql.avro.functions
Finished test(python3.8): pyspark.mllib.util (30s)
Starting test(python3.8): pyspark.sql.catalog
Finished test(python3.8): pyspark.serializers (17s)
Starting test(python3.8): pyspark.sql.column
Finished test(python3.8): pyspark.rdd (31s)
Starting test(python3.8): pyspark.sql.conf
Finished test(python3.8): pyspark.sql.conf (7s)
Starting test(python3.8): pyspark.sql.context
Finished test(python3.8): pyspark.sql.avro.functions (19s)
Starting test(python3.8): pyspark.sql.dataframe
Finished test(python3.8): pyspark.sql.catalog (16s)
Starting test(python3.8): pyspark.sql.functions
Finished test(python3.8): pyspark.sql.column (27s)
Starting test(python3.8): pyspark.sql.group
Finished test(python3.8): pyspark.sql.context (26s)
Starting test(python3.8): pyspark.sql.readwriter
Finished test(python3.8): pyspark.sql.group (52s)
Starting test(python3.8): pyspark.sql.session
Finished test(python3.8): pyspark.sql.dataframe (73s)
Starting test(python3.8): pyspark.sql.streaming
Finished test(python3.8): pyspark.sql.functions (75s)
Starting test(python3.8): pyspark.sql.types
Finished test(python3.8): pyspark.sql.readwriter (57s)
Starting test(python3.8): pyspark.sql.udf
Finished test(python3.8): pyspark.sql.types (13s)
Starting test(python3.8): pyspark.sql.window
Finished test(python3.8): pyspark.sql.session (32s)
Starting test(python3.8): pyspark.streaming.util
Finished test(python3.8): pyspark.streaming.util (1s)
Starting test(python3.8): pyspark.util
Finished test(python3.8): pyspark.util (0s)
Finished test(python3.8): pyspark.sql.streaming (30s)
Finished test(python3.8): pyspark.sql.udf (27s)
Finished test(python3.8): pyspark.sql.window (22s)
Tests passed in 855 seconds
```
</p>
</details>

Closes apache#26194 from HyukjinKwon/SPARK-29536.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
  • Loading branch information
HyukjinKwon committed Oct 22, 2019
commit 811d563fbf60203377e8462e4fad271c1140b4fa
257 changes: 220 additions & 37 deletions python/pyspark/cloudpickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@

import dis
from functools import partial
import importlib
import io
import itertools
import logging
Expand All @@ -56,12 +55,26 @@
import traceback
import types
import weakref
import uuid
import threading


try:
from enum import Enum
except ImportError:
Enum = None

# cloudpickle is meant for inter process communication: we expect all
# communicating processes to run the same Python version hence we favor
# communication speed over compatibility:
DEFAULT_PROTOCOL = pickle.HIGHEST_PROTOCOL

# Track the provenance of reconstructed dynamic classes to make it possible to
# recontruct instances from the matching singleton class definition when
# appropriate and preserve the usual "isinstance" semantics of Python objects.
_DYNAMIC_CLASS_TRACKER_BY_CLASS = weakref.WeakKeyDictionary()
_DYNAMIC_CLASS_TRACKER_BY_ID = weakref.WeakValueDictionary()
_DYNAMIC_CLASS_TRACKER_LOCK = threading.Lock()

if sys.version_info[0] < 3: # pragma: no branch
from pickle import Pickler
Expand All @@ -71,12 +84,37 @@
from StringIO import StringIO
string_types = (basestring,) # noqa
PY3 = False
PY2 = True
PY2_WRAPPER_DESCRIPTOR_TYPE = type(object.__init__)
PY2_METHOD_WRAPPER_TYPE = type(object.__eq__)
PY2_CLASS_DICT_BLACKLIST = (PY2_METHOD_WRAPPER_TYPE,
PY2_WRAPPER_DESCRIPTOR_TYPE)
else:
types.ClassType = type
from pickle import _Pickler as Pickler
from io import BytesIO as StringIO
string_types = (str,)
PY3 = True
PY2 = False


def _ensure_tracking(class_def):
with _DYNAMIC_CLASS_TRACKER_LOCK:
class_tracker_id = _DYNAMIC_CLASS_TRACKER_BY_CLASS.get(class_def)
if class_tracker_id is None:
class_tracker_id = uuid.uuid4().hex
_DYNAMIC_CLASS_TRACKER_BY_CLASS[class_def] = class_tracker_id
_DYNAMIC_CLASS_TRACKER_BY_ID[class_tracker_id] = class_def
return class_tracker_id


def _lookup_class_or_track(class_tracker_id, class_def):
if class_tracker_id is not None:
with _DYNAMIC_CLASS_TRACKER_LOCK:
class_def = _DYNAMIC_CLASS_TRACKER_BY_ID.setdefault(
class_tracker_id, class_def)
_DYNAMIC_CLASS_TRACKER_BY_CLASS[class_def] = class_tracker_id
return class_def


def _make_cell_set_template_code():
Expand Down Expand Up @@ -112,7 +150,7 @@ def inner(value):
# NOTE: we are marking the cell variable as a free variable intentionally
# so that we simulate an inner function instead of the outer function. This
# is what gives us the ``nonlocal`` behavior in a Python 2 compatible way.
if not PY3: # pragma: no branch
if PY2: # pragma: no branch
return types.CodeType(
co.co_argcount,
co.co_nlocals,
Expand All @@ -130,24 +168,43 @@ def inner(value):
(),
)
else:
return types.CodeType(
co.co_argcount,
co.co_kwonlyargcount,
co.co_nlocals,
co.co_stacksize,
co.co_flags,
co.co_code,
co.co_consts,
co.co_names,
co.co_varnames,
co.co_filename,
co.co_name,
co.co_firstlineno,
co.co_lnotab,
co.co_cellvars, # this is the trickery
(),
)

if hasattr(types.CodeType, "co_posonlyargcount"): # pragma: no branch
return types.CodeType(
co.co_argcount,
co.co_posonlyargcount, # Python3.8 with PEP570
co.co_kwonlyargcount,
co.co_nlocals,
co.co_stacksize,
co.co_flags,
co.co_code,
co.co_consts,
co.co_names,
co.co_varnames,
co.co_filename,
co.co_name,
co.co_firstlineno,
co.co_lnotab,
co.co_cellvars, # this is the trickery
(),
)
else:
return types.CodeType(
co.co_argcount,
co.co_kwonlyargcount,
co.co_nlocals,
co.co_stacksize,
co.co_flags,
co.co_code,
co.co_consts,
co.co_names,
co.co_varnames,
co.co_filename,
co.co_name,
co.co_firstlineno,
co.co_lnotab,
co.co_cellvars, # this is the trickery
(),
)

_cell_set_template_code = _make_cell_set_template_code()

Expand Down Expand Up @@ -220,7 +277,7 @@ def _walk_global_ops(code):
global-referencing instructions in *code*.
"""
code = getattr(code, 'co_code', b'')
if not PY3: # pragma: no branch
if PY2: # pragma: no branch
code = map(ord, code)

n = len(code)
Expand Down Expand Up @@ -250,6 +307,39 @@ def _walk_global_ops(code):
yield op, instr.arg


def _extract_class_dict(cls):
"""Retrieve a copy of the dict of a class without the inherited methods"""
clsdict = dict(cls.__dict__) # copy dict proxy to a dict
if len(cls.__bases__) == 1:
inherited_dict = cls.__bases__[0].__dict__
else:
inherited_dict = {}
for base in reversed(cls.__bases__):
inherited_dict.update(base.__dict__)
to_remove = []
for name, value in clsdict.items():
try:
base_value = inherited_dict[name]
if value is base_value:
to_remove.append(name)
elif PY2:
# backward compat for Python 2
if hasattr(value, "im_func"):
if value.im_func is getattr(base_value, "im_func", None):
to_remove.append(name)
elif isinstance(value, PY2_CLASS_DICT_BLACKLIST):
# On Python 2 we have no way to pickle those specific
# methods types nor to check that they are actually
# inherited. So we assume that they are always inherited
# from builtin types.
to_remove.append(name)
except KeyError:
pass
for name in to_remove:
clsdict.pop(name)
return clsdict


class CloudPickler(Pickler):

dispatch = Pickler.dispatch.copy()
Expand Down Expand Up @@ -277,7 +367,7 @@ def save_memoryview(self, obj):

dispatch[memoryview] = save_memoryview

if not PY3: # pragma: no branch
if PY2: # pragma: no branch
def save_buffer(self, obj):
self.save(str(obj))

Expand All @@ -300,12 +390,23 @@ def save_codeobject(self, obj):
Save a code object
"""
if PY3: # pragma: no branch
args = (
obj.co_argcount, obj.co_kwonlyargcount, obj.co_nlocals, obj.co_stacksize,
obj.co_flags, obj.co_code, obj.co_consts, obj.co_names, obj.co_varnames,
obj.co_filename, obj.co_name, obj.co_firstlineno, obj.co_lnotab, obj.co_freevars,
obj.co_cellvars
)
if hasattr(obj, "co_posonlyargcount"): # pragma: no branch
args = (
obj.co_argcount, obj.co_posonlyargcount,
obj.co_kwonlyargcount, obj.co_nlocals, obj.co_stacksize,
obj.co_flags, obj.co_code, obj.co_consts, obj.co_names,
obj.co_varnames, obj.co_filename, obj.co_name,
obj.co_firstlineno, obj.co_lnotab, obj.co_freevars,
obj.co_cellvars
)
else:
args = (
obj.co_argcount, obj.co_kwonlyargcount, obj.co_nlocals,
obj.co_stacksize, obj.co_flags, obj.co_code, obj.co_consts,
obj.co_names, obj.co_varnames, obj.co_filename,
obj.co_name, obj.co_firstlineno, obj.co_lnotab,
obj.co_freevars, obj.co_cellvars
)
else:
args = (
obj.co_argcount, obj.co_nlocals, obj.co_stacksize, obj.co_flags, obj.co_code,
Expand Down Expand Up @@ -460,15 +561,40 @@ def func():
# then discards the reference to it
self.write(pickle.POP)

def save_dynamic_class(self, obj):
def _save_dynamic_enum(self, obj, clsdict):
"""Special handling for dynamic Enum subclasses

Use a dedicated Enum constructor (inspired by EnumMeta.__call__) as the
EnumMeta metaclass has complex initialization that makes the Enum
subclasses hold references to their own instances.
"""
Save a class that can't be stored as module global.
members = dict((e.name, e.value) for e in obj)

# Python 2.7 with enum34 can have no qualname:
qualname = getattr(obj, "__qualname__", None)

self.save_reduce(_make_skeleton_enum,
(obj.__bases__, obj.__name__, qualname, members,
obj.__module__, _ensure_tracking(obj), None),
obj=obj)

# Cleanup the clsdict that will be passed to _rehydrate_skeleton_class:
# Those attributes are already handled by the metaclass.
for attrname in ["_generate_next_value_", "_member_names_",
"_member_map_", "_member_type_",
"_value2member_map_"]:
clsdict.pop(attrname, None)
for member in members:
clsdict.pop(member)

def save_dynamic_class(self, obj):
"""Save a class that can't be stored as module global.

This method is used to serialize classes that are defined inside
functions, or that otherwise can't be serialized as attribute lookups
from global modules.
"""
clsdict = dict(obj.__dict__) # copy dict proxy to a dict
clsdict = _extract_class_dict(obj)
clsdict.pop('__weakref__', None)

# For ABCMeta in python3.7+, remove _abc_impl as it is not picklable.
Expand Down Expand Up @@ -496,8 +622,8 @@ def save_dynamic_class(self, obj):
for k in obj.__slots__:
clsdict.pop(k, None)

# If type overrides __dict__ as a property, include it in the type kwargs.
# In Python 2, we can't set this attribute after construction.
# If type overrides __dict__ as a property, include it in the type
# kwargs. In Python 2, we can't set this attribute after construction.
__dict__ = clsdict.pop('__dict__', None)
if isinstance(__dict__, property):
type_kwargs['__dict__'] = __dict__
Expand All @@ -524,8 +650,16 @@ def save_dynamic_class(self, obj):
write(pickle.MARK)

# Create and memoize an skeleton class with obj's name and bases.
tp = type(obj)
self.save_reduce(tp, (obj.__name__, obj.__bases__, type_kwargs), obj=obj)
if Enum is not None and issubclass(obj, Enum):
# Special handling of Enum subclasses
self._save_dynamic_enum(obj, clsdict)
else:
# "Regular" class definition:
tp = type(obj)
self.save_reduce(_make_skeleton_class,
(tp, obj.__name__, obj.__bases__, type_kwargs,
_ensure_tracking(obj), None),
obj=obj)

# Now save the rest of obj's __dict__. Any references to obj
# encountered while saving will point to the skeleton class.
Expand Down Expand Up @@ -778,7 +912,7 @@ def save_inst(self, obj):
save(stuff)
write(pickle.BUILD)

if not PY3: # pragma: no branch
if PY2: # pragma: no branch
dispatch[types.InstanceType] = save_inst

def save_property(self, obj):
Expand Down Expand Up @@ -1119,6 +1253,22 @@ def _make_skel_func(code, cell_count, base_globals=None):
return types.FunctionType(code, base_globals, None, None, closure)


def _make_skeleton_class(type_constructor, name, bases, type_kwargs,
class_tracker_id, extra):
"""Build dynamic class with an empty __dict__ to be filled once memoized

If class_tracker_id is not None, try to lookup an existing class definition
matching that id. If none is found, track a newly reconstructed class
definition under that id so that other instances stemming from the same
class id will also reuse this class definition.

The "extra" variable is meant to be a dict (or None) that can be used for
forward compatibility shall the need arise.
"""
skeleton_class = type_constructor(name, bases, type_kwargs)
return _lookup_class_or_track(class_tracker_id, skeleton_class)


def _rehydrate_skeleton_class(skeleton_class, class_dict):
"""Put attributes from `class_dict` back on `skeleton_class`.

Expand All @@ -1137,6 +1287,39 @@ def _rehydrate_skeleton_class(skeleton_class, class_dict):
return skeleton_class


def _make_skeleton_enum(bases, name, qualname, members, module,
class_tracker_id, extra):
"""Build dynamic enum with an empty __dict__ to be filled once memoized

The creation of the enum class is inspired by the code of
EnumMeta._create_.

If class_tracker_id is not None, try to lookup an existing enum definition
matching that id. If none is found, track a newly reconstructed enum
definition under that id so that other instances stemming from the same
class id will also reuse this enum definition.

The "extra" variable is meant to be a dict (or None) that can be used for
forward compatibility shall the need arise.
"""
# enums always inherit from their base Enum class at the last position in
# the list of base classes:
enum_base = bases[-1]
metacls = enum_base.__class__
classdict = metacls.__prepare__(name, bases)

for member_name, member_value in members.items():
classdict[member_name] = member_value
enum_class = metacls.__new__(metacls, name, bases, classdict)
enum_class.__module__ = module

# Python 2.7 compat
if qualname is not None:
enum_class.__qualname__ = qualname

return _lookup_class_or_track(class_tracker_id, enum_class)


def _is_dynamic(module):
"""
Return True if the module is special module that cannot be imported by its
Expand Down Expand Up @@ -1176,4 +1359,4 @@ def _reduce_method_descriptor(obj):
import copy_reg as copyreg
except ImportError:
import copyreg
copyreg.pickle(method_descriptor, _reduce_method_descriptor)
copyreg.pickle(method_descriptor, _reduce_method_descriptor)
1 change: 1 addition & 0 deletions python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,7 @@ def _supports_symlinks():
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: Implementation :: CPython',
'Programming Language :: Python :: Implementation :: PyPy']
)
Expand Down