|
| 1 | +# Move UnsafeMutablePointer.initialize(from:) to UnsafeMutableBufferPointer |
| 2 | + |
| 3 | +* Proposal: [SE-0147](0147-move-unsafe-initialize-from.md) |
| 4 | +* Authors: [Ben Cohen](https://github.com/airspeedswift) |
| 5 | +* Review Manager: TBD |
| 6 | +* Status: **Awaiting review** |
| 7 | + |
| 8 | +## Introduction |
| 9 | + |
| 10 | +The version of `UnsafeMutablePointer.initialize(from:)` that takes a `Collection` should be deprecated in favor of a new method on `UnsafeMutableBufferPointer` that takes a `Sequence`, with a goal of improving memory safety and enabling faster initialization of memory from sequences. Similarly, `UnsafeMutableRawPointer.initializeMemory(as:from:)` should be deprecated in favour of a new `UnsafeMutableRawBufferPointer.initialize(as:from:)`. |
| 11 | + |
| 12 | +## Motivation |
| 13 | + |
| 14 | +`UnsafeMutablePointer.initialize(from:)` underpins implementations of collections, such as `Array`, which are backed by a buffer of contiguous memory. When operations like `Array.append` are implemented, they first ensure their backing store can accommodate the number of elements in a source collection, then pass that collection into the `initialize` method of their backing store. |
| 15 | + |
| 16 | +Unfortunately there is a major flaw in this design: a collection's `count` might not accurately reflect the number of elements returned by its iterator. For example, some collections can be misused to return different results on each pass. Or a collection could just be implemented incorrectly. |
| 17 | + |
| 18 | +If the collection's `count` ends up being lower than the actual number of elements yielded by its iterator, the caller may not allocate enough memory for them. Since `UnsafeMutablePointer.initialize(from:)` does not receive a limiting capacity, this method would then scribble past the end of the buffer, resulting in undefined behaviour. |
| 19 | + |
| 20 | +Normally when using `Unsafe...` constructs in Swift the burden of ensuring safety falls on the caller. When using this method with something known to have correct behavior, like an `Array`, you can do that. But when used in a generic context like `Array.append(contentsOf:)`, where the caller of `initialize` does not know exactly what kind of collection they are passing in, it is impossible to use this method safely. You can see the impact of this by running the following code. which exhibits memory-unsafe behavior despite only using “safe” constructs from the standard library, something that shouldn’t be possible: |
| 21 | + |
| 22 | +```swift |
| 23 | +var i = 0 |
| 24 | +let c = repeatElement(42, count: 10_000).lazy.filter { _ in |
| 25 | + // capture i and use it to exhibit inconsistent |
| 26 | + // behavior across iteration of c |
| 27 | + i += 1; return i > 10_000 |
| 28 | +} |
| 29 | +var a: [Int] = [] |
| 30 | +// a will allocate insufficient memory before |
| 31 | +// calling self._buffer.initialize(from: c) |
| 32 | +a.append(contentsOf: c) // memory access violation |
| 33 | +``` |
| 34 | + |
| 35 | +While a collection returning an inconsistent count is a programming error (in this case, use of the lazy filter in combination with an logically impure function, breaking value semantics), and it would be reasonable for the standard library to trap under these circumstances, undefined behavior like this is not OK. |
| 36 | + |
| 37 | +In addition, the requirement to pre-allocate enough memory to accommodate `from.count` elements rules out using this method to initialize memory from a sequence, since sequences don't have a `count` property (they have an `underestimatedCount` but this isn't enough since underestimated counts are exactly the problem described above). The proposed solution would allow for this, enabling some internal performance optimizations for generic code. |
| 38 | + |
| 39 | +## Proposed solution |
| 40 | + |
| 41 | +The existing `initialize` method should be altered to receive a capacity, to avoid running beyond what the caller has allocated. Since `UnsafeMutableBufferPointer` already exists to encapsulate "pointer plus a count", the method should be moved to that type and the old method deprecated. |
| 42 | + |
| 43 | +This new method should take a `Sequence` as its `from` argument, and handle possible element overflow, returning an `Iterator` of any elements not written due to a lack of space. It should also return an index into the buffer to indicate where the elements were written up to in cases of underflow. |
| 44 | + |
| 45 | +Once this has been done, the version of `Array.append(contentsOf:)` that takes a collection can be eliminated, since the performance benefits of the collection version could be incorporated into the implementation of the one that takes a sequence. |
| 46 | + |
| 47 | +The intention of this change is to add memory safety, not to allow the flexibility of collections giving inconsistent counts. Therefore the precondition should remain that the caller should ensure enough memory is allocated to accommodate `source.underestedCount` elements. The only difference is if they don’t, the behaviour should be well-defined (ideally by trapping, if this can be done efficiently). |
| 48 | + |
| 49 | +Therefore: |
| 50 | + |
| 51 | +- Under-allocating the destination buffer relative to `underestimatedCount` may trap at runtime. _May_ rather than _will_ because this is an `O(n)` operation on some collections, so may only be enforced in debug builds. |
| 52 | +- Over-allocating the destination buffer relative to `underestimatedCount` is valid and simply results in sequence underflow with potentially uninitialized buffer memory (a likely case with arrays that reserve more than they need). |
| 53 | +- The source sequence's actual count may exceed both `underestimatedCount` and the destination buffer size, resulting in sequence overflow. This is also valid and handled by returning an iterator to the uncopied elements as an overflow sequence. |
| 54 | + |
| 55 | +A matching change should also be made to `UnsafeRawBufferPointer.initializeMemory(from:)`. The one difference is that for convenience this should return an `UnsafeMutableBufferPointer` of the (typed) intialized elements instead of an index into the raw buffer. |
| 56 | + |
| 57 | +## Detailed design |
| 58 | + |
| 59 | +The following API changes would be made: |
| 60 | + |
| 61 | +```swift |
| 62 | +extension UnsafeMutablePointer { |
| 63 | + @available(*, deprecated, message: "it will be removed in Swift 4.0. Please use 'UnsafeMutableBufferPointer.initialize(from:)' instead") |
| 64 | + public func initialize<C : Collection>(from source: C) |
| 65 | + where C.Iterator.Element == Pointee |
| 66 | +} |
| 67 | + |
| 68 | +extension UnsafeMutableBufferPointer { |
| 69 | + /// Initializes memory in the buffer with the elements of `source`. |
| 70 | + /// Returns an iterator to any elements of `source` that didn't fit in the |
| 71 | + /// buffer, and an index to the point in the buffer one past the last element |
| 72 | + /// written (so `startIndex` if no elements written, `endIndex` if the buffer |
| 73 | + /// was completely filled). |
| 74 | + /// |
| 75 | + /// - Precondition: The memory in `self` is uninitialized. The buffer must contain |
| 76 | + /// sufficient uninitialized memory to accommodate `source.underestimatedCount`. |
| 77 | + /// |
| 78 | + /// - Postcondition: The returned iterator |
| 79 | + /// - Postcondition: The `Pointee`s at `self[startIndex..<initializedUpTo]` |
| 80 | + /// are initialized. |
| 81 | + @discardableResult |
| 82 | + public func initialize<S: Sequence>( |
| 83 | + from source: S |
| 84 | + ) -> (unwritten: S.Iterator, initializedUpTo: Index) |
| 85 | + where S.Iterator.Element == Iterator.Element |
| 86 | +} |
| 87 | + |
| 88 | +extension UnsafeMutableRawPointer { |
| 89 | + @available(*, deprecated, message: "it will be removed in Swift 4.0. Please use 'UnsafeMutableRawBufferPointer.initialize(from:)' instead") |
| 90 | + @discardableResult |
| 91 | + public func initializeMemory<C : Collection>( |
| 92 | + as: C.Iterator.Element.Type, from source: C |
| 93 | + ) -> UnsafeMutablePointer<C.Iterator.Element> |
| 94 | +} |
| 95 | + |
| 96 | +extension UnsafeMutableRawBufferPointer { |
| 97 | + /// Initializes memory in the buffer with the elements of |
| 98 | + /// `source` and binds the initialized memory to type `T`. |
| 99 | + /// |
| 100 | + /// Returns an iterator to any elements of `source` that didn't fit in the |
| 101 | + /// buffer, and an index into the buffer one past the last byte written. |
| 102 | + /// |
| 103 | + /// - Precondition: The memory in `self` is uninitialized or initialized to a |
| 104 | + /// trivial type. |
| 105 | + /// |
| 106 | + /// - Precondition: The buffer must contain sufficient memory to |
| 107 | + /// accommodate at least `source.underestimatedCount` elements. |
| 108 | + /// |
| 109 | + /// - Postcondition: The memory at `self[startIndex..<initialized.count * |
| 110 | + /// MemoryLayout<S.Iterator.Element>.stride] is bound to type `S.Iterator`. |
| 111 | + /// |
| 112 | + /// - Postcondition: The memory at `self[startIndex..<initialized.count * |
| 113 | + /// MemoryLayout<S.Iterator.Element>.stride] are initialized.. |
| 114 | + @discardableResult |
| 115 | + public func initializeMemory<S: Sequence>( |
| 116 | + as: S.Iterator.Element.Type, from source: S |
| 117 | + ) -> (unwritten: S.Iterator, initialized: UnsafeMutableBufferPointer<S.Iterator.Element>) |
| 118 | +} |
| 119 | + |
| 120 | +``` |
| 121 | + |
| 122 | +The `+=` operators and `append<C : Collection>(contentsOf newElements: C)` methods on `Array`, `ArraySlice` and `ContiguousArray` will be removed as no-longer needed, since the implementation that takes a sequence can be made to be as efficient. The `+=` can be replaced by a generic one that calls `RangeReplaceableCollection.append(contenstsOf:)`: |
| 123 | + |
| 124 | +(note, because it forwards on to a protocol requirement, it itself does not need to be a static operator protocol requirement) |
| 125 | + |
| 126 | +```swift |
| 127 | +/// Appends the elements of a sequence to a range-replaceable collection. |
| 128 | +/// |
| 129 | +/// Use this operator to append the elements of a sequence to the end of |
| 130 | +/// range-replaceable collection with same `Element` type. This example |
| 131 | +/// appends the elements of a `Range<Int>` instance to an array of |
| 132 | +/// integers. |
| 133 | +/// |
| 134 | +/// var numbers = [1, 2, 3, 4, 5] |
| 135 | +/// numbers += 10...15 |
| 136 | +/// print(numbers) |
| 137 | +/// // Prints "[1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15]" |
| 138 | +/// |
| 139 | +/// - Parameters: |
| 140 | +/// - lhs: The array to append to. |
| 141 | +/// - rhs: A collection or finite sequence. |
| 142 | +/// |
| 143 | +/// - Complexity: O(*n*), where *n* is the length of the resulting array. |
| 144 | +public func += < |
| 145 | + R : RangeReplaceableCollection, S : Sequence |
| 146 | +>(lhs: inout R, rhs: S) |
| 147 | + where R.Iterator.Element == S.Iterator.Element |
| 148 | +``` |
| 149 | + |
| 150 | +## Source compatibility |
| 151 | + |
| 152 | +The addition of the new method does not affect source compatibility. The deprecation of the old method does, but since this is a fundamentally unsound operation that cannot be fixed except via a source-breaking change, it should be aggressively deprecated and then removed. |
| 153 | + |
| 154 | +The knock-on ability to remove the version of `Array.append(contentsOf:)` that takes a collection does not affect source compatability since the version for sequences will be called for collections instead. |
| 155 | + |
| 156 | +## Effect on ABI stability |
| 157 | + |
| 158 | +This change must be made prior to declaring ABI stability, since it is currently called from the `Array.append` method, which is inlineable. |
| 159 | + |
| 160 | +## Alternatives considered |
| 161 | + |
| 162 | +Overflow (elements remain on the returned iterator) and underflow (`initializedUpTo != endIndex`) are almost but not quite mutually exclusive – if the buffer is exactly used, the caller must call `.next()` to check for any unwritten elements, which means the returned value must be declared `var`, and the check can't be chained. This is a little ugly, but is the unavoidable consequence of how iterators work: since iterating is consuming, the `initialize` method cannot easily test for this and indicate it back to the caller in some other way (such as returning an optional iterator). |
| 163 | + |
0 commit comments