You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+21-19Lines changed: 21 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,8 @@ This tool helps you to figure out which commands use a lot of RAM and take a lon
7
7
8
8
As a simple example - make 10,000,000 random numbers, report that it costs 76MB of RAM and took 0.3 seconds to execute:
9
9
10
-
In [3]: arr=np.random.uniform(size=1E7)
11
-
'arr=np.random.uniform(size=1E7)' used 76.2578 MiB RAM in 0.33s, peaked 0.00 MiB above current, total RAM usage 107.37 MiB
10
+
In [3]: arr=np.random.uniform(size=int(1e7))
11
+
'arr=np.random.uniform(size=int(1e7))' used 76.2578 MiB RAM in 0.33s, peaked 0.00 MiB above current, total RAM usage 107.37 MiB
12
12
13
13
Francesc Alted has a fork with more memory delta details, see it here: https://github.com/FrancescAlted/ipython_memwatcher
14
14
@@ -41,32 +41,34 @@ We can measure on every line how large array operations allocate and deallocate
41
41
IPython 3.2.0 -- An enhanced Interactive Python.
42
42
43
43
In [1]: import ipython_memory_usage.ipython_memory_usage as imu
44
+
In [2]: import numpy as np
44
45
45
-
In [2]: imu.start_watching_memory()
46
-
In [2] used 0.0469 MiB RAM in 7.32s, peaked 0.00 MiB above current, total RAM usage 56.88 MiB
46
+
In [3]: imu.start_watching_memory()
47
+
In [3] used 0.0469 MiB RAM in 7.32s, peaked 0.00 MiB above current, total RAM usage 56.88 MiB
47
48
48
-
In [3]: a = np.ones(1e7)
49
-
In [3] used 76.3750 MiB RAM in 0.14s, peaked 0.00 MiB above current, total RAM usage 133.25 MiB
49
+
In [4]: a = np.ones(int(1e7))
50
+
In [4] used 76.3750 MiB RAM in 0.14s, peaked 0.00 MiB above current, total RAM usage 133.25 MiB
50
51
51
-
In [4]: del a
52
-
In [4] used -76.2031 MiB RAM in 0.10s, total RAM usage 57.05 MiB
52
+
In [5]: del a
53
+
In [5] used -76.2031 MiB RAM in 0.10s, total RAM usage 57.05 MiB
53
54
54
55
55
56
You can use `stop_watching_memory` to do stop watching and printing memory usage after each statement:
56
57
57
-
In [5]: imu.stop_watching_memory()
58
+
In [6]: imu.stop_watching_memory()
58
59
59
-
In [6]: b = np.ones(1e7)
60
+
In [7]: b = np.ones(int(1e7))
60
61
61
-
In [7]: b[0] * 5.0
62
-
Out[7]: 5.0
62
+
In [8]: b[0] * 5.0
63
+
Out[8]: 5.0
63
64
64
65
65
66
For the beginner with numpy it can be easy to work on copies of matrices which use a large amount of RAM. The following example sets the scene and then shows an in-place low-RAM variant.
66
67
67
68
First we make a random square array and modify it twice using copies taking 2.3GB RAM:
68
-
69
-
In [2]: a = np.random.random((1e4, 1e4))
69
+
70
+
In [1]: imu.start_watching_memory()
71
+
In [2]: a = np.random.random((int(1e4),int(1e4)))
70
72
In [2] used 762.9531 MiB RAM in 2.21s, peaked 0.00 MiB above current, total RAM usage 812.30 MiB
71
73
72
74
In [3]: b = a*2
@@ -78,7 +80,7 @@ First we make a random square array and modify it twice using copies taking 2.3G
78
80
79
81
Now we do the same operations but in-place on `a`, using 813MB RAM in total:
80
82
81
-
In [2]: a = np.random.random((1e4, 1e4))
83
+
In [2]: a = np.random.random((int(1e4),int(1e4)))
82
84
In [2] used 762.9531 MiB RAM in 2.21s, peaked 0.00 MiB above current, total RAM usage 812.30 MiB
83
85
In [3]: a *= 2
84
86
In [3] used 0.0078 MiB RAM in 0.21s, peaked 0.00 MiB above current, total RAM usage 812.30 MiB
@@ -104,15 +106,15 @@ If we make a large 1.5GB array of random integers we can `sqrt` in-place using t
104
106
105
107
We can also see the hidden temporary objects that are created _during_ the execution of a command. Below you can see that whilst `d = a * b + c` takes 3.1GB overall, it peaks at approximately 3.7GB due to the 5th temporary matrix which holds the temporary result of `a * b`.
106
108
107
-
In [2]: a = np.ones(1e8); b = np.ones(1e8); c = np.ones(1e8)
109
+
In [2]: a = np.ones(int(1e8)); b = np.ones(int(1e8)); c = np.ones(int(1e8))
108
110
In [2] used 2288.8750 MiB RAM in 1.02s, peaked 0.00 MiB above current, total RAM usage 2338.06 MiB
109
111
110
112
In [3]: d = a * b + c
111
113
In [3] used 762.9453 MiB RAM in 0.91s, peaked 667.91 MiB above current, total RAM usage 3101.01 MiB
112
114
113
115
Knowing that a temporary is created, we can do an in-place operation instead for the same result but a lower overall RAM footprint:
114
116
115
-
In [2]: a = np.ones(1e8); b = np.ones(1e8); c = np.ones(1e8)
117
+
In [2]: a = np.ones(int(1e8)); b = np.ones(int(1e8)); c = np.ones(int(1e8))
116
118
In [2] used 2288.8750 MiB RAM in 1.02s, peaked 0.00 MiB above current, total RAM usage 2338.06 MiB
117
119
118
120
In [3]: d = a * b
@@ -141,8 +143,8 @@ I've added experimental support for the `perf stat` tool on Linux. To use it mak
141
143
142
144
Here's an example that builds on the previous ones. We build a square matrix with C ordering, we also need a 1D vector of the same size:
143
145
144
-
In [3]: ones_c = np.ones((1e4, 1e4))
145
-
In [4]: v = np.ones(1e4)
146
+
In [3]: ones_c = np.ones((int(1e4),int(1e4)))
147
+
In [4]: v = np.ones(int(1e4))
146
148
147
149
Next we run `%timeit` using all the data in row 0. The data will reasonably fit into a cache as `v.nbytes == 80000` (80 kilobytes) and my L3 cache is 6MB. The report `perf value for cache-misses averages to 8,823/second` shows an average of 8k cache misses per seconds during this operation (followed by all the raw sampled events for reference). `%timeit` shows that this operation cost 14 microseconds per loop:
0 commit comments