Skip to content
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
3632624
Support tuples for `find` & `rfind`
nineteendo May 24, 2024
e39b040
Update docs
nineteendo May 24, 2024
cb905bc
Add tests
nineteendo May 24, 2024
1807fd8
📜🤖 Added by blurb_it.
blurb-it[bot] May 24, 2024
cca08fa
Apply suggestions from code review
nineteendo May 24, 2024
302faa3
Apply suggestions from code review
nineteendo May 24, 2024
cb95578
Fix signature tests
nineteendo May 24, 2024
a35d3ae
Short circuit
nineteendo May 24, 2024
65c0a9e
Fix start for `rfind`
nineteendo May 25, 2024
5cbb1f0
Refactor checks
nineteendo May 25, 2024
00b2b04
Fix end for `rfind`
nineteendo May 25, 2024
e124603
Adjust indices
nineteendo May 25, 2024
41b0cd8
Micro optimisation
nineteendo May 25, 2024
7b83a22
Fix conversion
nineteendo May 25, 2024
c905458
Fix condition
nineteendo May 25, 2024
5c79f24
Add tests
nineteendo May 25, 2024
148b471
Clarify documentation
nineteendo May 25, 2024
351dc83
Add constant
nineteendo May 25, 2024
ddaf4b4
Duplicate constant
nineteendo May 25, 2024
2b044a1
Add tests
nineteendo May 25, 2024
a632f25
Remove newline
nineteendo May 25, 2024
ef28dab
Update Lib/test/string_tests.py
nineteendo May 25, 2024
4207d54
Update Lib/test/string_tests.py
nineteendo May 25, 2024
0dff482
Update Lib/test/string_tests.py
nineteendo May 25, 2024
fc0d9ea
Update Lib/test/string_tests.py
nineteendo May 25, 2024
cd317fd
Don't check twice on boundary
nineteendo May 25, 2024
43e8259
Apply suggestions from code review
nineteendo May 25, 2024
2524dc1
Apply suggestions from code review
nineteendo May 25, 2024
dbc8c94
Test bytes
nineteendo May 25, 2024
49a28a0
Add more bytes tests
nineteendo May 25, 2024
0bd606d
Support tuples for index & rindex
nineteendo May 25, 2024
b337fdc
Update Objects/bytes_methods.c
nineteendo May 25, 2024
e43373f
Update Misc/NEWS.d/next/Core and Builtins/2024-05-24-11-07-16.gh-issu…
nineteendo May 25, 2024
b47b0e0
Update docs
nineteendo May 25, 2024
6f71b39
Refactor code
nineteendo May 26, 2024
64ef311
Fix error message
nineteendo May 26, 2024
a116f33
Add asserts
nineteendo May 26, 2024
e29828d
Remove unnecessary check
nineteendo May 26, 2024
a85f84a
Revert "Remove unnecessary check"
nineteendo May 26, 2024
ac19e87
Optimise length of 0 & 1
nineteendo May 26, 2024
b62e8b4
Avoid testing with tuples of 1 item
nineteendo May 26, 2024
b6492db
Simplify news.
nineteendo May 26, 2024
dd23e04
Fix indentation
nineteendo May 26, 2024
38d2df8
Handle -2
nineteendo May 26, 2024
223cb1b
Update Misc/NEWS.d/next/Core and Builtins/2024-05-24-11-07-16.gh-issu…
nineteendo May 27, 2024
bc29c92
Guard overflow
nineteendo May 27, 2024
f14ee7d
Tweak `FIND_CHUNK_SIZE`
nineteendo May 27, 2024
3606e00
Refer to `re` & `regex`
nineteendo May 28, 2024
9e2006c
Release buffer
nineteendo May 28, 2024
fb48c41
Release other buffer
nineteendo May 29, 2024
308174c
Save lengths
nineteendo May 29, 2024
6a3d651
malloc
nineteendo May 29, 2024
3227e63
Fix malloc
nineteendo May 29, 2024
70d673f
Store needles for bytes
nineteendo Jun 1, 2024
7b205b3
Revert test
nineteendo Jun 1, 2024
0664ced
Restructure code
nineteendo Jun 1, 2024
b132742
Fix smelly symbol
nineteendo Jun 1, 2024
8189c66
Make static
nineteendo Jun 1, 2024
53d3a07
Remove variable
nineteendo Jun 1, 2024
648725d
Reverse comparison
nineteendo Jun 1, 2024
4fe06fb
Add brackets
nineteendo Jun 1, 2024
145f45d
Remove continue
nineteendo Jun 1, 2024
c96775c
2 arguments per line
nineteendo Jun 1, 2024
b4722c4
Exclude long needles
nineteendo Jun 1, 2024
5c8751a
Include needles with a larger kind
nineteendo Jun 1, 2024
c219cf5
fast find for strings
nineteendo Jun 2, 2024
ccbfa0e
Fix argument type
nineteendo Jun 2, 2024
aada7f5
Rename argument
nineteendo Jun 2, 2024
090ddee
Decrease diff
nineteendo Jun 2, 2024
6b85fd7
Decrease diff 2
nineteendo Jun 2, 2024
41a6c20
Decrease diff 3
nineteendo Jun 2, 2024
460effa
Remove continue
nineteendo Jun 2, 2024
d1c4af6
Parentheses
nineteendo Jun 2, 2024
c19ddcf
Store converted needles on the heap
nineteendo Jun 2, 2024
6beae49
cleanup
nineteendo Jun 2, 2024
ff514be
Fix uninitialised variable
nineteendo Jun 2, 2024
ff6eea2
Try to prevent segmentation fault
nineteendo Jun 2, 2024
0cbf03a
Fix cast
nineteendo Jun 2, 2024
d412046
Revert "Fix cast"
nineteendo Jun 2, 2024
168fe84
Revert "Try to prevent segmentation fault"
nineteendo Jun 2, 2024
41b11e5
Uninitialised memory?
nineteendo Jun 3, 2024
ffe1152
More tests
nineteendo Jun 3, 2024
ac91f79
Rename parameter
nineteendo Jun 3, 2024
6751992
Unnest
nineteendo Jun 3, 2024
44aebd1
Keep buffers acquired during search
nineteendo Jun 3, 2024
9a51fd9
Add `buffers_len`
nineteendo Jun 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 25 additions & 10 deletions Doc/library/stdtypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1724,8 +1724,9 @@ expression support in the :mod:`re` module).
.. method:: str.find(sub[, start[, end]])

Return the lowest index in the string where substring *sub* is found within
the slice ``s[start:end]``. Optional arguments *start* and *end* are
interpreted as in slice notation. Return ``-1`` if *sub* is not found.
the slice ``s[start:end]``. *sub* can also be a tuple of substrings to look
for. Optional arguments *start* and *end* are interpreted as in slice
Comment thread
nineteendo marked this conversation as resolved.
Outdated
notation. Return ``-1`` if *sub* is not found.

.. note::

Expand All @@ -1736,6 +1737,9 @@ expression support in the :mod:`re` module).
>>> 'Py' in 'Python'
True

.. versionchanged:: 3.14
*sub* can now be a tuple of substrings.


.. method:: str.format(*args, **kwargs)

Expand Down Expand Up @@ -2030,8 +2034,12 @@ expression support in the :mod:`re` module).
.. method:: str.rfind(sub[, start[, end]])

Return the highest index in the string where substring *sub* is found, such
that *sub* is contained within ``s[start:end]``. Optional arguments *start*
and *end* are interpreted as in slice notation. Return ``-1`` on failure.
that *sub* is contained within ``s[start:end]``. *sub* can also be a tuple
of substrings to look for. Optional arguments *start* and *end* are
interpreted as in slice notation. Return ``-1`` on failure.

.. versionchanged:: 3.14
*sub* can now be a tuple of substrings.


.. method:: str.rindex(sub[, start[, end]])
Expand Down Expand Up @@ -2859,9 +2867,10 @@ arbitrary binary data.
bytearray.find(sub[, start[, end]])

Return the lowest index in the data where the subsequence *sub* is found,
such that *sub* is contained in the slice ``s[start:end]``. Optional
arguments *start* and *end* are interpreted as in slice notation. Return
``-1`` if *sub* is not found.
such that *sub* is contained in the slice ``s[start:end]``. *sub* can
also be a tuple of subsequences to look for. Optional arguments *start*
and *end* are interpreted as in slice notation. Return ``-1`` if *sub*
is not found.

The subsequence to search for may be any :term:`bytes-like object` or an
integer in the range 0 to 255.
Expand All @@ -2878,6 +2887,9 @@ arbitrary binary data.
.. versionchanged:: 3.3
Also accept an integer in the range 0 to 255 as the subsequence.

.. versionchanged:: 3.14
*sub* can now be a tuple of subsequences.


.. method:: bytes.index(sub[, start[, end]])
bytearray.index(sub[, start[, end]])
Expand Down Expand Up @@ -2947,16 +2959,19 @@ arbitrary binary data.
bytearray.rfind(sub[, start[, end]])

Return the highest index in the sequence where the subsequence *sub* is
found, such that *sub* is contained within ``s[start:end]``. Optional
arguments *start* and *end* are interpreted as in slice notation. Return
``-1`` on failure.
found, such that *sub* is contained within ``s[start:end]``. *sub* can
also be a tuple of subsequences to look for. Optional arguments *start*
and *end* are interpreted as in slice notation. Return ``-1`` on failure.

The subsequence to search for may be any :term:`bytes-like object` or an
integer in the range 0 to 255.

.. versionchanged:: 3.3
Also accept an integer in the range 0 to 255 as the subsequence.

.. versionchanged:: 3.14
*sub* can now be a tuple of subsequences.


.. method:: bytes.rindex(sub[, start[, end]])
bytearray.rindex(sub[, start[, end]])
Expand Down
25 changes: 25 additions & 0 deletions Lib/test/string_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,19 @@ def test_find(self):
if loc != -1:
self.assertEqual(i[loc:loc+len(j)], j)

# test tuple arguments
self.checkequal(2, '__aa__bb__', 'find', ('aa', 'bb'))
self.checkequal(2, '__aa__bb__', 'find', ('bb', 'aa'))
self.checkequal(-1, '__aa__bb__', 'find', ('cc', 'dd'))
self.checkequal(-1, '__aa__bb__', 'find', ())
self.checkequal(6, '__aa__bb__', 'find', ('aa', 'bb'), 3)
self.checkequal(-1, '__aa__bb__', 'find', ('aa', 'cc'), 3)
self.checkequal(2, '__aa__bb__', 'find', ('aa', 'bb'), 0, 10)
self.checkequal(-1, '__aa__bb__', 'find', ('aa', 'bb'), 0, 3)
self.checkequal(2, '__aa__bb__', 'find', ('aa', 'bb'), 0, 4)
Comment thread
nineteendo marked this conversation as resolved.

self.checkraises(TypeError, 'hello', 'find', (42,))

Comment thread
nineteendo marked this conversation as resolved.
Outdated
Comment thread
nineteendo marked this conversation as resolved.
Comment thread
nineteendo marked this conversation as resolved.
def test_rfind(self):
self.checkequal(9, 'abcdefghiabc', 'rfind', 'abc')
self.checkequal(12, 'abcdefghiabc', 'rfind', '')
Expand Down Expand Up @@ -270,6 +283,18 @@ def test_rfind(self):
# issue #15534
self.checkequal(0, '<......\u043c...', "rfind", "<")

# test tuple arguments
self.checkequal(6, '__aa__bb__', 'rfind', ('aa', 'bb'))
self.checkequal(6, '__aa__bb__', 'rfind', ('bb', 'aa'))
self.checkequal(-1, '__aa__bb__', 'rfind', ('cc', 'dd'))
self.checkequal(-1, '__aa__bb__', 'rfind', ())
self.checkequal(-1, '__aa__bb__', 'rfind', ('aa', 'cc'), 3)
self.checkequal(6, '__aa__bb__', 'rfind', ('aa', 'bb'), 0, 10)
self.checkequal(-1, '__aa__bb__', 'rfind', ('aa', 'bb'), 7, 10)
self.checkequal(6, '__aa__bb__', 'rfind', ('aa', 'bb'), 6, 10)
Comment thread
nineteendo marked this conversation as resolved.

self.checkraises(TypeError, 'hello', 'rfind', (42,))

Comment thread
nineteendo marked this conversation as resolved.
Outdated
Comment thread
nineteendo marked this conversation as resolved.
def test_index(self):
self.checkequal(0, 'abcdefghiabc', 'index', '')
self.checkequal(3, 'abcdefghiabc', 'index', 'def')
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Support tuples for :meth:`str.find`, :meth:`bytearray.find`, :meth:`bytes.find`, :meth:`str.rfind`, :meth:`bytearray.rfind` and :meth:`bytes.rfind`.
Comment thread
nineteendo marked this conversation as resolved.
Outdated
40 changes: 36 additions & 4 deletions Objects/bytes_methods.c
Original file line number Diff line number Diff line change
Expand Up @@ -557,10 +557,26 @@ find_internal(const char *str, Py_ssize_t len,
}

PyObject *
_Py_bytes_find(const char *str, Py_ssize_t len, PyObject *sub,
_Py_bytes_find(const char *str, Py_ssize_t len, PyObject *subobj,
Py_ssize_t start, Py_ssize_t end)
{
Py_ssize_t result = find_internal(str, len, "find", sub, start, end, +1);
Py_ssize_t result;
if (PyTuple_Check(subobj)) {
result = -1;
for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(subobj); i++) {
PyObject *subseq = PyTuple_GET_ITEM(subobj, i);
Py_ssize_t new_result = find_internal(str, len, "find", subseq,
start, end, +1);
if (new_result == -2) {
return NULL;
}
if (new_result != -1 && (new_result < result || result == -1)) {
result = new_result;
}
}
return PyLong_FromSsize_t(result);
}
result = find_internal(str, len, "find", subobj, start, end, +1);
if (result == -2)
return NULL;
return PyLong_FromSsize_t(result);
Expand All @@ -582,10 +598,26 @@ _Py_bytes_index(const char *str, Py_ssize_t len, PyObject *sub,
}

PyObject *
_Py_bytes_rfind(const char *str, Py_ssize_t len, PyObject *sub,
_Py_bytes_rfind(const char *str, Py_ssize_t len, PyObject *subobj,
Py_ssize_t start, Py_ssize_t end)
{
Py_ssize_t result = find_internal(str, len, "rfind", sub, start, end, -1);
Py_ssize_t result;
if (PyTuple_Check(subobj)) {
result = -1;
for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(subobj); i++) {
PyObject *subseq = PyTuple_GET_ITEM(subobj, i);
Py_ssize_t new_result = find_internal(str, len, "rfind", subseq,
start, end, -1);
if (new_result == -2) {
return NULL;
}
if (new_result > result) {
result = new_result;
}
}
return PyLong_FromSsize_t(result);
}
result = find_internal(str, len, "rfind", subobj, start, end, -1);
if (result == -2)
return NULL;
return PyLong_FromSsize_t(result);
Expand Down
30 changes: 11 additions & 19 deletions Objects/clinic/unicodeobject.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

72 changes: 64 additions & 8 deletions Objects/unicodeobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -11334,7 +11334,13 @@ unicode_expandtabs_impl(PyObject *self, int tabsize)
}

/*[clinic input]
str.find as unicode_find = str.count
str.find as unicode_find -> Py_ssize_t

self as str: self
sub as subobj: object
start: slice_index(accept={int, NoneType}, c_default='0') = None
end: slice_index(accept={int, NoneType}, c_default='PY_SSIZE_T_MAX') = None
/

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].

Expand All @@ -11343,11 +11349,36 @@ Return -1 on failure.
[clinic start generated code]*/

static Py_ssize_t
unicode_find_impl(PyObject *str, PyObject *substr, Py_ssize_t start,
unicode_find_impl(PyObject *str, PyObject *subobj, Py_ssize_t start,
Py_ssize_t end)
/*[clinic end generated code: output=51dbe6255712e278 input=4a89d2d68ef57256]*/
/*[clinic end generated code: output=80175735a6d549d0 input=51e7b530950ab304]*/
{
Py_ssize_t result = any_find_slice(str, substr, start, end, 1);
Py_ssize_t result;
if (PyTuple_Check(subobj)) {
result = -1;
for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(subobj); i++) {
PyObject *substr = PyTuple_GET_ITEM(subobj, i);
if (!PyUnicode_Check(substr)) {
PyErr_Format(PyExc_TypeError,
"tuple for find must only contain str, "
"not %.100s",
Py_TYPE(substr)->tp_name);
return -1;
}
Py_ssize_t new_result = any_find_slice(str, substr, start, end, 1);
if (new_result != -1 && (new_result < result || result == -1)) {
result = new_result;
}
}
return result;
}
if (!PyUnicode_Check(subobj)) {
PyErr_Format(PyExc_TypeError,
"find first arg must be str or "
"a tuple of str, not %.100s", Py_TYPE(subobj)->tp_name);
return -1;
}
result = any_find_slice(str, subobj, start, end, 1);
if (result < 0) {
return -1;
}
Expand Down Expand Up @@ -12496,7 +12527,7 @@ unicode_repr(PyObject *unicode)
}

/*[clinic input]
str.rfind as unicode_rfind = str.count
str.rfind as unicode_rfind = str.find

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].

Expand All @@ -12505,11 +12536,36 @@ Return -1 on failure.
[clinic start generated code]*/

static Py_ssize_t
unicode_rfind_impl(PyObject *str, PyObject *substr, Py_ssize_t start,
unicode_rfind_impl(PyObject *str, PyObject *subobj, Py_ssize_t start,
Py_ssize_t end)
/*[clinic end generated code: output=880b29f01dd014c8 input=898361fb71f59294]*/
/*[clinic end generated code: output=9d316eee7b9f9bf0 input=23ae7964e8f70b35]*/
{
Py_ssize_t result = any_find_slice(str, substr, start, end, -1);
Py_ssize_t result;
if (PyTuple_Check(subobj)) {
result = -1;
for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(subobj); i++) {
PyObject *substr = PyTuple_GET_ITEM(subobj, i);
if (!PyUnicode_Check(substr)) {
PyErr_Format(PyExc_TypeError,
"tuple for rfind must only contain str, "
"not %.100s",
Py_TYPE(substr)->tp_name);
return -1;
}
Py_ssize_t new_result = any_find_slice(str, substr, start, end, -1);
if (new_result > result) {
result = new_result;
}
}
return result;
}
if (!PyUnicode_Check(subobj)) {
PyErr_Format(PyExc_TypeError,
"rfind first arg must be str or "
"a tuple of str, not %.100s", Py_TYPE(subobj)->tp_name);
return -1;
}
result = any_find_slice(str, subobj, start, end, -1);
if (result < 0) {
return -1;
}
Expand Down