Skip to content

Commit ad89e04

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into genrange
2 parents 4a9c542 + 145c227 commit ad89e04

File tree

30 files changed

+839
-289
lines changed

30 files changed

+839
-289
lines changed

ci/code_checks.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -122,22 +122,22 @@ fi
122122
if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
123123

124124
MSG='Doctests frame.py' ; echo $MSG
125-
pytest --doctest-modules -v pandas/core/frame.py \
125+
pytest -q --doctest-modules pandas/core/frame.py \
126126
-k"-axes -combine -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_stata"
127127
RET=$(($RET + $?)) ; echo $MSG "DONE"
128128

129129
MSG='Doctests series.py' ; echo $MSG
130-
pytest --doctest-modules -v pandas/core/series.py \
130+
pytest -q --doctest-modules pandas/core/series.py \
131131
-k"-nonzero -reindex -searchsorted -to_dict"
132132
RET=$(($RET + $?)) ; echo $MSG "DONE"
133133

134134
MSG='Doctests generic.py' ; echo $MSG
135-
pytest --doctest-modules -v pandas/core/generic.py \
135+
pytest -q --doctest-modules pandas/core/generic.py \
136136
-k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -resample -to_json -transpose -values -xs"
137137
RET=$(($RET + $?)) ; echo $MSG "DONE"
138138

139139
MSG='Doctests top-level reshaping functions' ; echo $MSG
140-
pytest --doctest-modules -v \
140+
pytest -q --doctest-modules \
141141
pandas/core/reshape/concat.py \
142142
pandas/core/reshape/pivot.py \
143143
pandas/core/reshape/reshape.py \

doc/source/extending.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,12 @@ There are two approaches for providing operator support for your ExtensionArray:
135135
2. Use an operator implementation from pandas that depends on operators that are already defined
136136
on the underlying elements (scalars) of the ExtensionArray.
137137

138+
.. note::
139+
140+
Regardless of the approach, you may want to set ``__array_priority__``
141+
if you want your implementation to be called when involved in binary operations
142+
with NumPy arrays.
143+
138144
For the first approach, you define selected operators, e.g., ``__add__``, ``__le__``, etc. that
139145
you want your ``ExtensionArray`` subclass to support.
140146

@@ -173,6 +179,16 @@ or not that succeeds depends on whether the operation returns a result
173179
that's valid for the ``ExtensionArray``. If an ``ExtensionArray`` cannot
174180
be reconstructed, an ndarray containing the scalars returned instead.
175181

182+
For ease of implementation and consistency with operations between pandas
183+
and NumPy ndarrays, we recommend *not* handling Series and Indexes in your binary ops.
184+
Instead, you should detect these cases and return ``NotImplemented``.
185+
When pandas encounters an operation like ``op(Series, ExtensionArray)``, pandas
186+
will
187+
188+
1. unbox the array from the ``Series`` (roughly ``Series.values``)
189+
2. call ``result = op(values, ExtensionArray)``
190+
3. re-box the result in a ``Series``
191+
176192
.. _extending.extension.testing:
177193

178194
Testing Extension Arrays

doc/source/whatsnew/v0.24.0.txt

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,97 @@ If installed, we now require:
235235
| scipy | 0.18.1 | |
236236
+-----------------+-----------------+----------+
237237

238+
.. _whatsnew_0240.api_breaking.csv_line_terminator:
239+
240+
`os.linesep` is used for ``line_terminator`` of ``DataFrame.to_csv``
241+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
242+
243+
:func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'``
244+
for the default line terminator (:issue:`20353`).
245+
This change only affects when running on Windows, where ``'\r\n'`` was used for line terminator
246+
even when ``'\n'`` was passed in ``line_terminator``.
247+
248+
Previous Behavior on Windows:
249+
250+
.. code-block:: ipython
251+
252+
In [1]: data = pd.DataFrame({
253+
...: "string_with_lf": ["a\nbc"],
254+
...: "string_with_crlf": ["a\r\nbc"]
255+
...: })
256+
257+
In [2]: # When passing file PATH to to_csv, line_terminator does not work, and csv is saved with '\r\n'.
258+
...: # Also, this converts all '\n's in the data to '\r\n'.
259+
...: data.to_csv("test.csv", index=False, line_terminator='\n')
260+
261+
In [3]: with open("test.csv", mode='rb') as f:
262+
...: print(f.read())
263+
b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n'
264+
265+
In [4]: # When passing file OBJECT with newline option to to_csv, line_terminator works.
266+
...: with open("test2.csv", mode='w', newline='\n') as f:
267+
...: data.to_csv(f, index=False, line_terminator='\n')
268+
269+
In [5]: with open("test2.csv", mode='rb') as f:
270+
...: print(f.read())
271+
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'
272+
273+
274+
New Behavior on Windows:
275+
276+
- By passing ``line_terminator`` explicitly, line terminator is set to that character.
277+
- The value of ``line_terminator`` only affects the line terminator of CSV,
278+
so it does not change the value inside the data.
279+
280+
.. code-block:: ipython
281+
282+
In [1]: data = pd.DataFrame({
283+
...: "string_with_lf": ["a\nbc"],
284+
...: "string_with_crlf": ["a\r\nbc"]
285+
...: })
286+
287+
In [2]: data.to_csv("test.csv", index=False, line_terminator='\n')
288+
289+
In [3]: with open("test.csv", mode='rb') as f:
290+
...: print(f.read())
291+
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'
292+
293+
294+
- On Windows, the value of ``os.linesep`` is ``'\r\n'``,
295+
so if ``line_terminator`` is not set, ``'\r\n'`` is used for line terminator.
296+
- Again, it does not affect the value inside the data.
297+
298+
.. code-block:: ipython
299+
300+
In [1]: data = pd.DataFrame({
301+
...: "string_with_lf": ["a\nbc"],
302+
...: "string_with_crlf": ["a\r\nbc"]
303+
...: })
304+
305+
In [2]: data.to_csv("test.csv", index=False)
306+
307+
In [3]: with open("test.csv", mode='rb') as f:
308+
...: print(f.read())
309+
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'
310+
311+
312+
- For files objects, specifying ``newline`` is not sufficient to set the line terminator.
313+
You must pass in the ``line_terminator`` explicitly, even in this case.
314+
315+
.. code-block:: ipython
316+
317+
In [1]: data = pd.DataFrame({
318+
...: "string_with_lf": ["a\nbc"],
319+
...: "string_with_crlf": ["a\r\nbc"]
320+
...: })
321+
322+
In [2]: with open("test2.csv", mode='w', newline='\n') as f:
323+
...: data.to_csv(f, index=False)
324+
325+
In [3]: with open("test2.csv", mode='rb') as f:
326+
...: print(f.read())
327+
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'
328+
238329
.. _whatsnew_0240.api_breaking.interval_values:
239330

240331
``IntervalIndex.values`` is now an ``IntervalArray``
@@ -714,6 +805,8 @@ Other API Changes
714805
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`)
715806
- :meth:`DataFrame.corr` and :meth:`Series.corr` now raise a ``ValueError`` along with a helpful error message instead of a ``KeyError`` when supplied with an invalid method (:issue:`22298`)
716807
- :meth:`shift` will now always return a copy, instead of the previous behaviour of returning self when shifting by 0 (:issue:`22397`)
808+
- :meth:`DataFrame.set_index` now allows all one-dimensional list-likes, raises a ``TypeError`` for incorrect types,
809+
has an improved ``KeyError`` message, and will not fail on duplicate column names with ``drop=True``. (:issue:`22484`)
717810
- Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
718811
- :class:`DateOffset` attribute `_cacheable` and method `_should_cache` have been removed (:issue:`23118`)
719812

@@ -878,6 +971,7 @@ Numeric
878971
- Bug in :meth:`DataFrame.apply` where, when supplied with a string argument and additional positional or keyword arguments (e.g. ``df.apply('sum', min_count=1)``), a ``TypeError`` was wrongly raised (:issue:`22376`)
879972
- Bug in :meth:`DataFrame.astype` to extension dtype may raise ``AttributeError`` (:issue:`22578`)
880973
- Bug in :class:`DataFrame` with ``timedelta64[ns]`` dtype arithmetic operations with ``ndarray`` with integer dtype incorrectly treating the narray as ``timedelta64[ns]`` dtype (:issue:`23114`)
974+
- Bug in :meth:`Series.rpow` with object dtype ``NaN`` for ``1 ** NA`` instead of ``1`` (:issue:`22922`).
881975

882976
Strings
883977
^^^^^^^

pandas/core/arrays/base.py

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
import operator
1111

12+
from pandas.core.dtypes.generic import ABCSeries, ABCIndexClass
1213
from pandas.errors import AbstractMethodError
1314
from pandas.compat.numpy import function as nv
1415
from pandas.compat import set_function_name, PY3
@@ -109,6 +110,7 @@ def _from_sequence(cls, scalars, dtype=None, copy=False):
109110
compatible with the ExtensionArray.
110111
copy : boolean, default False
111112
If True, copy the underlying data.
113+
112114
Returns
113115
-------
114116
ExtensionArray
@@ -724,7 +726,13 @@ def _reduce(self, name, skipna=True, **kwargs):
724726

725727
class ExtensionOpsMixin(object):
726728
"""
727-
A base class for linking the operators to their dunder names
729+
A base class for linking the operators to their dunder names.
730+
731+
.. note::
732+
733+
You may want to set ``__array_priority__`` if you want your
734+
implementation to be called when involved in binary operations
735+
with NumPy arrays.
728736
"""
729737

730738
@classmethod
@@ -761,12 +769,14 @@ def _add_comparison_ops(cls):
761769

762770

763771
class ExtensionScalarOpsMixin(ExtensionOpsMixin):
764-
"""A mixin for defining the arithmetic and logical operations on
765-
an ExtensionArray class, where it is assumed that the underlying objects
766-
have the operators already defined.
772+
"""
773+
A mixin for defining ops on an ExtensionArray.
774+
775+
It is assumed that the underlying scalar objects have the operators
776+
already defined.
767777
768-
Usage
769-
------
778+
Notes
779+
-----
770780
If you have defined a subclass MyExtensionArray(ExtensionArray), then
771781
use MyExtensionArray(ExtensionArray, ExtensionScalarOpsMixin) to
772782
get the arithmetic operators. After the definition of MyExtensionArray,
@@ -776,6 +786,12 @@ class ExtensionScalarOpsMixin(ExtensionOpsMixin):
776786
MyExtensionArray._add_comparison_ops()
777787
778788
to link the operators to your class.
789+
790+
.. note::
791+
792+
You may want to set ``__array_priority__`` if you want your
793+
implementation to be called when involved in binary operations
794+
with NumPy arrays.
779795
"""
780796

781797
@classmethod
@@ -825,6 +841,11 @@ def convert_values(param):
825841
else: # Assume its an object
826842
ovalues = [param] * len(self)
827843
return ovalues
844+
845+
if isinstance(other, (ABCSeries, ABCIndexClass)):
846+
# rely on pandas to unbox and dispatch to us
847+
return NotImplemented
848+
828849
lvalues = self
829850
rvalues = convert_values(other)
830851

pandas/core/arrays/integer.py

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
import copy
44
import numpy as np
55

6-
from pandas._libs.lib import infer_dtype
6+
7+
from pandas._libs import lib
78
from pandas.util._decorators import cache_readonly
89
from pandas.compat import u, range, string_types
910
from pandas.compat import set_function_name
@@ -171,7 +172,7 @@ def coerce_to_array(values, dtype, mask=None, copy=False):
171172

172173
values = np.array(values, copy=copy)
173174
if is_object_dtype(values):
174-
inferred_type = infer_dtype(values)
175+
inferred_type = lib.infer_dtype(values)
175176
if inferred_type not in ['floating', 'integer',
176177
'mixed-integer', 'mixed-integer-float']:
177178
raise TypeError("{} cannot be converted to an IntegerDtype".format(
@@ -280,6 +281,8 @@ def _coerce_to_ndarray(self):
280281
data[self._mask] = self._na_value
281282
return data
282283

284+
__array_priority__ = 1000 # higher than ndarray so ops dispatch to us
285+
283286
def __array__(self, dtype=None):
284287
"""
285288
the array interface, return my values
@@ -288,12 +291,6 @@ def __array__(self, dtype=None):
288291
return self._coerce_to_ndarray()
289292

290293
def __iter__(self):
291-
"""Iterate over elements of the array.
292-
293-
"""
294-
# This needs to be implemented so that pandas recognizes extension
295-
# arrays as list-like. The default implementation makes successive
296-
# calls to ``__getitem__``, which may be slower than necessary.
297294
for i in range(len(self)):
298295
if self._mask[i]:
299296
yield self.dtype.na_value
@@ -504,13 +501,21 @@ def cmp_method(self, other):
504501

505502
op_name = op.__name__
506503
mask = None
504+
505+
if isinstance(other, (ABCSeries, ABCIndexClass)):
506+
# Rely on pandas to unbox and dispatch to us.
507+
return NotImplemented
508+
507509
if isinstance(other, IntegerArray):
508510
other, mask = other._data, other._mask
511+
509512
elif is_list_like(other):
510513
other = np.asarray(other)
511514
if other.ndim > 0 and len(self) != len(other):
512515
raise ValueError('Lengths must match to compare')
513516

517+
other = lib.item_from_zerodim(other)
518+
514519
# numpy will show a DeprecationWarning on invalid elementwise
515520
# comparisons, this will raise in the future
516521
with warnings.catch_warnings():
@@ -586,14 +591,21 @@ def integer_arithmetic_method(self, other):
586591

587592
op_name = op.__name__
588593
mask = None
594+
589595
if isinstance(other, (ABCSeries, ABCIndexClass)):
590-
other = getattr(other, 'values', other)
596+
# Rely on pandas to unbox and dispatch to us.
597+
return NotImplemented
591598

592-
if isinstance(other, IntegerArray):
593-
other, mask = other._data, other._mask
594-
elif getattr(other, 'ndim', 0) > 1:
599+
if getattr(other, 'ndim', 0) > 1:
595600
raise NotImplementedError(
596601
"can only perform ops with 1-d structures")
602+
603+
if isinstance(other, IntegerArray):
604+
other, mask = other._data, other._mask
605+
606+
elif getattr(other, 'ndim', None) == 0:
607+
other = other.item()
608+
597609
elif is_list_like(other):
598610
other = np.asarray(other)
599611
if not other.ndim:
@@ -612,6 +624,13 @@ def integer_arithmetic_method(self, other):
612624
else:
613625
mask = self._mask | mask
614626

627+
# 1 ** np.nan is 1. So we have to unmask those.
628+
if op_name == 'pow':
629+
mask = np.where(self == 1, False, mask)
630+
631+
elif op_name == 'rpow':
632+
mask = np.where(other == 1, False, mask)
633+
615634
with np.errstate(all='ignore'):
616635
result = op(self._data, other)
617636

0 commit comments

Comments
 (0)