|
msg368175 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-05-05 15:48 |
tuple, dict and frame use free lists to optimize the creation of objects.
Unicode uses "interned" strings to reduce the Python memory footprint and speedup dictionary lookups.
Unicode also uses singletons for single letter Latin1 characters ([U+0000; U+00FF] range).
All these optimizations are incompatible with isolated subinterpreters, since caches are currently shared by all inteprepreters. These caches should be made per-intepreter. See bpo-40512 "Meta issue: per-interpreter GIL" for the rationale.
I already made small integer singletons per interpreter in bpo-38858:
* commit 5dcc06f6e0d7b5d6589085692b86c63e35e2325e
* commit 630c8df5cf126594f8c1c4579c1888ca80a29d59.
|
|
msg368177 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-05-05 16:50 |
New changeset 607b1027fec7b4a1602aab7df57795fbcec1c51b by Victor Stinner in branch 'master':
bpo-40521: Disable Unicode caches in isolated subinterpreters (GH-19933)
https://github.com/python/cpython/commit/607b1027fec7b4a1602aab7df57795fbcec1c51b
|
|
msg368187 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-05-05 17:55 |
New changeset b4b53868d7d6cd13505321d3802fd00865b25e05 by Victor Stinner in branch 'master':
bpo-40521: Disable free lists in subinterpreters (GH-19937)
https://github.com/python/cpython/commit/b4b53868d7d6cd13505321d3802fd00865b25e05
|
|
msg368278 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-05-06 16:24 |
New changeset 89fc4a34cf7a01df9dd269d32d3706c68a72d130 by Victor Stinner in branch 'master':
bpo-40521: Disable method cache in subinterpreters (GH-19960)
https://github.com/python/cpython/commit/89fc4a34cf7a01df9dd269d32d3706c68a72d130
|
|
msg368283 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-05-06 17:05 |
New changeset b7aa23d29fa48238dab3692d02e1f0a7e8a5af9c by Victor Stinner in branch 'master':
bpo-40521: Disable list free list in subinterpreters (GH-19959)
https://github.com/python/cpython/commit/b7aa23d29fa48238dab3692d02e1f0a7e8a5af9c
|
|
msg368807 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-05-13 23:35 |
I wrote a draft PR to make interned strings per-interpreter. It does crash because it requires to make method cache and _PyUnicode_FromId() (bpo-39465) compatible with subinterpreters.
|
|
msg368808 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-05-13 23:48 |
New changeset 3d17c045b4c3d09b72bbd95ed78af1ae6f0d98d2 by Victor Stinner in branch 'master':
bpo-40521: Add PyInterpreterState.unicode (GH-20081)
https://github.com/python/cpython/commit/3d17c045b4c3d09b72bbd95ed78af1ae6f0d98d2
|
|
msg369407 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-05-19 23:57 |
New changeset 0509c4547fc95cc32a91ac446a26192c3bfdf157 by Victor Stinner in branch 'master':
bpo-40521: Fix update_slot() when INTERN_NAME_STRINGS is not defined (#20246)
https://github.com/python/cpython/commit/0509c4547fc95cc32a91ac446a26192c3bfdf157
|
|
msg370636 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-02 22:43 |
Microbenchmark for tuple free list to measure PR 20247 overhead: microbench_tuple.py. It requires to apply bench_tuple.patch.
|
|
msg370733 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-04 21:38 |
New changeset 69ac6e58fd98de339c013fe64cd1cf763e4f9bca by Victor Stinner in branch 'master':
bpo-40521: Make tuple free list per-interpreter (GH-20247)
https://github.com/python/cpython/commit/69ac6e58fd98de339c013fe64cd1cf763e4f9bca
|
|
msg370734 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-04 22:50 |
New changeset 2ba59370c3dda2ac229c14510e53a05074b133d1 by Victor Stinner in branch 'master':
bpo-40521: Make float free list per-interpreter (GH-20636)
https://github.com/python/cpython/commit/2ba59370c3dda2ac229c14510e53a05074b133d1
|
|
msg370735 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-04 23:14 |
New changeset 7daba6f221e713f7f60c613b246459b07d179f91 by Victor Stinner in branch 'master':
bpo-40521: Make slice cache per-interpreter (GH-20637)
https://github.com/python/cpython/commit/7daba6f221e713f7f60c613b246459b07d179f91
|
|
msg370737 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-04 23:39 |
New changeset 3744ed2c9c0b3905947602fc375de49533790cb9 by Victor Stinner in branch 'master':
bpo-40521: Make frame free list per-interpreter (GH-20638)
https://github.com/python/cpython/commit/3744ed2c9c0b3905947602fc375de49533790cb9
|
|
msg370740 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-05 00:05 |
New changeset 88ec9190105c9b03f49aaef601ce02b242a75273 by Victor Stinner in branch 'master':
bpo-40521: Make list free list per-interpreter (GH-20642)
https://github.com/python/cpython/commit/88ec9190105c9b03f49aaef601ce02b242a75273
|
|
msg370741 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-05 00:34 |
New changeset 78a02c2568714562e23e885b6dc5730601f35226 by Victor Stinner in branch 'master':
bpo-40521: Make async gen free lists per-interpreter (GH-20643)
https://github.com/python/cpython/commit/78a02c2568714562e23e885b6dc5730601f35226
|
|
msg370742 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-05 00:56 |
New changeset e005ead49b1ee2b1507ceea94e6f89c28ecf1f81 by Victor Stinner in branch 'master':
bpo-40521: Make context free list per-interpreter (GH-20644)
https://github.com/python/cpython/commit/e005ead49b1ee2b1507ceea94e6f89c28ecf1f81
|
|
msg370754 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-05 09:47 |
> bpo-40521: Make list free list per-interpreter (GH-20642)
> https://github.com/python/cpython/commit/88ec9190105c9b03f49aaef601ce02b242a75273
This change contains an interesting fix:
* _PyGC_Fini() clears gcstate->garbage list which can be stored in
the list free list. Call _PyGC_Fini() before _PyList_Fini() to
prevent leaking this list.
Maybe "Fini" functions should disable free lists to prevent following code to add something to a free list, during Python finalization.
|
|
msg370755 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-05 09:50 |
bench_dict.patch: Microbenchmark on the C function PyDict_New() to measure the overhead of PR 20645.
|
|
msg370756 - (view) |
Author: Mark Shannon (Mark.Shannon) *  |
Date: 2020-06-05 09:57 |
I'm worried about the performance impact of these changes, especially as many of the changes haven't been reviewed.
Have you done any performance analysis or tests of the cumulative effect of all these changes?
|
|
msg370757 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-05 09:59 |
> Have you done any performance analysis or tests of the cumulative effect of all these changes?
No. It would be interesting to measure that using pyperformance.
|
|
msg370771 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-05 17:32 |
pyperformance comparaison between:
* commit dc24b8a2ac32114313bae519db3ccc21fe45c982 (before "Make tuple free list per-interpreter" change)
* PR 20645 (dict free lists) which cumulates all free lists changes (already commited + the PR)
Extract of the tested patch, new PyInterpreterState members:
--------------------
diff --git a/Include/internal/pycore_interp.h b/Include/internal/pycore_interp.h
index f04ea330d0..b1a25e0ed4 100644
--- a/Include/internal/pycore_interp.h
+++ b/Include/internal/pycore_interp.h
(...)
@@ -157,6 +233,18 @@ struct _is {
*/
PyLongObject* small_ints[_PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS];
#endif
+ struct _Py_unicode_state unicode;
+ struct _Py_float_state float_state;
+ /* Using a cache is very effective since typically only a single slice is
+ created and then deleted again. */
+ PySliceObject *slice_cache;
+
+ struct _Py_tuple_state tuple;
+ struct _Py_list_state list;
+ struct _Py_dict_state dict_state;
+ struct _Py_frame_state frame;
+ struct _Py_async_gen_state async_gen;
+ struct _Py_context_state context;
};
--------------------
Results:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G
Slower (10):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)
- python_startup_no_site: 8.71 ms +- 0.77 ms -> 8.94 ms +- 0.91 ms: 1.03x slower (+3%)
- xml_etree_process: 130 ms +- 1 ms -> 133 ms +- 2 ms: 1.02x slower (+2%)
Faster (9):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)
- django_template: 123 ms +- 16 ms -> 119 ms +- 2 ms: 1.04x faster (-3%)
- xml_etree_generate: 160 ms +- 4 ms -> 156 ms +- 3 ms: 1.02x faster (-2%)
- xml_etree_iterparse: 178 ms +- 3 ms -> 177 ms +- 2 ms: 1.01x faster (-1%)
Benchmark hidden because not significant (41): (...)
--------------------
If we ignore differences smaller than 5%:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G --min-speed=5
Slower (8):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)
Faster (6):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)
Benchmark hidden because not significant (46): (...)
--------------------
Honestly, I'm surprised by these results. I don't see how these free lists change can make between 6 and 9 benchamrks faster (ex: 1.08x faster for telco!?). For me, it sounds like speed.python.org runner has some troubles. You can notice it if you look at the 3 last runs at https://speed.python.org/ : they are some spikes (in both directions, faster or slower) which are very surprising.
Pablo recently upgrade Ubuntu on the benchmark runner server. I don't know if it's related.
I plan to recompute all benchmarks run on the benchmark runner server since over the last years, pyperf and pyperformance were upgraded multiple times (old data were computed with old versions) and the system (Ubuntu) was upgraded (again, old data were computed with older Ubiuntu packages).
|
|
msg370928 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-07 23:38 |
See also bpo-40887: "Free lists are still used after being finalized (cleared)".
|
|
msg370969 - (view) |
Author: Mark Shannon (Mark.Shannon) *  |
Date: 2020-06-08 09:24 |
I'd be interested to see if you can get more consistent results.
Performance of modern hardware is very sensitive to memory layout, so some sort of address randomization might be needed to remove artifacts of layout.
It is possible that the objects on the free lists for telco are better aligned with cache lines, or fit is cache better in some way.
And conversely, in chameleon, objects fit cache in a worse way.
Just a guess, of course.
Thanks for trying to get some benchmark results.
|
|
msg372146 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-23 09:33 |
New changeset b4e85cadfbc2b1b24ec5f3159e351dbacedaa5e0 by Victor Stinner in branch 'master':
bpo-40521: Make dict free lists per-interpreter (GH-20645)
https://github.com/python/cpython/commit/b4e85cadfbc2b1b24ec5f3159e351dbacedaa5e0
|
|
msg372148 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-23 09:38 |
All free lists are now per-interpreter! See Modules/gcmodule.c:
static void
clear_freelists(PyThreadState *tstate)
{
_PyFrame_ClearFreeList(tstate);
_PyTuple_ClearFreeList(tstate);
_PyFloat_ClearFreeList(tstate);
_PyList_ClearFreeList(tstate);
_PyDict_ClearFreeList(tstate);
_PyAsyncGen_ClearFreeLists(tstate);
_PyContext_ClearFreeList(tstate);
}
I'm still working on the Unicode caches.
|
|
msg372161 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-23 12:08 |
New changeset 261cfedf7657a515e04428bba58eba2a9bb88208 by Victor Stinner in branch 'master':
bpo-40521: Make the empty frozenset per interpreter (GH-21068)
https://github.com/python/cpython/commit/261cfedf7657a515e04428bba58eba2a9bb88208
|
|
msg372168 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2020-06-23 13:50 |
New changeset 32f2eda85957365d208f499b730d30b7eb419741 by Raymond Hettinger in branch 'master':
bpo-40521: Remove freelist from collections.deque() (GH-21073)
https://github.com/python/cpython/commit/32f2eda85957365d208f499b730d30b7eb419741
|
|
msg372169 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-23 13:54 |
New changeset c41eed1a874e2f22bde45c3c89418414b7a37f46 by Victor Stinner in branch 'master':
bpo-40521: Make bytes singletons per interpreter (GH-21074)
https://github.com/python/cpython/commit/c41eed1a874e2f22bde45c3c89418414b7a37f46
|
|
msg372176 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-23 14:40 |
New changeset 522691c46e2ae51faaad5bbbce7d959dd61770df by Victor Stinner in branch 'master':
bpo-40521: Cleanup code of free lists (GH-21082)
https://github.com/python/cpython/commit/522691c46e2ae51faaad5bbbce7d959dd61770df
|
|
msg372181 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-23 15:43 |
New changeset f9bd05e83e32bece49de5af0c9a232325c57648a by Raymond Hettinger in branch 'master':
bpo-40521: Empty frozenset is no longer a singleton (GH-21085)
https://github.com/python/cpython/commit/f9bd05e83e32bece49de5af0c9a232325c57648a
|
|
msg372207 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-23 20:55 |
New changeset 281cce1106568ef9fec17e3c72d289416fac02a5 by Victor Stinner in branch 'master':
bpo-40521: Make MemoryError free list per interpreter (GH-21086)
https://github.com/python/cpython/commit/281cce1106568ef9fec17e3c72d289416fac02a5
|
|
msg372209 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-23 22:10 |
New changeset f363d0a6e9cfa50677a6de203735fbc0d06c2f49 by Victor Stinner in branch 'master':
bpo-40521: Make empty Unicode string per interpreter (GH-21096)
https://github.com/python/cpython/commit/f363d0a6e9cfa50677a6de203735fbc0d06c2f49
|
|
msg372216 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-23 22:34 |
New changeset 90ed8a6d71b2d6e0853c14e8e6f85fe730a4329a by Victor Stinner in branch 'master':
bpo-40521: Optimize PyUnicode_New(0, maxchar) (GH-21099)
https://github.com/python/cpython/commit/90ed8a6d71b2d6e0853c14e8e6f85fe730a4329a
|
|
msg372220 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-24 00:22 |
New changeset 2f9ada96e0d420fed0d09a032b37197f08ef167a by Victor Stinner in branch 'master':
bpo-40521: Make Unicode latin1 singletons per interpreter (GH-21101)
https://github.com/python/cpython/commit/2f9ada96e0d420fed0d09a032b37197f08ef167a
|
|
msg372223 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-24 01:21 |
New changeset cde283d16d87024f455e45c6f1b4e4f7d8905836 by Victor Stinner in branch 'master':
bpo-40521: Fix _PyContext_Fini() (GH-21103)
https://github.com/python/cpython/commit/cde283d16d87024f455e45c6f1b4e4f7d8905836
|
|
msg372250 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-24 13:22 |
New changeset 0430dfac629b4eb0e899a09b899a494aa92145f6 by Victor Stinner in branch 'master':
bpo-40521: Always create the empty tuple singleton (GH-21116)
https://github.com/python/cpython/commit/0430dfac629b4eb0e899a09b899a494aa92145f6
|
|
msg372357 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-06-25 12:07 |
New changeset 91698d8caa4b5bb6e8dbb64b156e8afe9e32cac1 by Victor Stinner in branch 'master':
bpo-40521: Optimize PyBytes_FromStringAndSize(str, 0) (GH-21142)
https://github.com/python/cpython/commit/91698d8caa4b5bb6e8dbb64b156e8afe9e32cac1
|
|
msg372795 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-07-01 21:21 |
New changeset 90db4653ae37ef90754cfd2cd6ec6857b87a88e6 by Victor Stinner in branch 'master':
bpo-40521: Cleanup finalize_interp_types() (GH-21265)
https://github.com/python/cpython/commit/90db4653ae37ef90754cfd2cd6ec6857b87a88e6
|
|
msg377368 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2020-09-23 12:05 |
New changeset 7f413a5d95e6d7ddddd6e2c9844c33594d6288f4 by Victor Stinner in branch 'master':
bpo-40521: Fix PyUnicode_InternInPlace() (GH-22376)
https://github.com/python/cpython/commit/7f413a5d95e6d7ddddd6e2c9844c33594d6288f4
|