Propose to add a funtion heapIndex#23204
Propose to add a funtion heapIndex#23204SamUnimelb wants to merge 1 commit intopython:masterfrom SamUnimelb:patch-2
Conversation
For a given element in a heap, we can leverage the fact that we can
search this element quicker thinking of the property of a heap. Therefore
out of h.index(x) that a list linear search uses, I propose to use a special
written index method to look for an index of a heap element.
|
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
|
Please open an issue at bugs.python.org for discussion. There is also a similar PR in #23203 |
|
Thanks @tirkarthi for your reply and the issue was already created. Please ignore #23203 since #23204 here is a better implementation on heapIndex. |
|
How did you get Θ(log n)? Given a heap Also, the proposed search function has considerably more overhead than list.index() |
|
Hi @rhettinger , Good questions on the use cases and time complexity analysis. For use cases: We've all agreed that at least this added function can be used for finding an index of a given element in a list heapified as a heap, which is its first use case. Secondly, extracting an element has at least (as I know) an important use case to optimize dijkstra algorithm from O(VE) to O((V+E)lgV), during which requires extracting (popping) an element from a heap that equals the value of a node (p.g.111, Algorithms Illuminated Part II, also p.g.662 of Introduction to Algorithms). For time complexity analysis: To be honest by reading the above 2 books I have the same concern as you, still believe this won't be O(lgn) as they declare, as in your response your worst case leading to O(n) is given. However, I would believe it can be Theta(lgn), whose reason is obvious. In my code this version if during search x is found to be bigger than h[checkIdx] and smaller than h[2*checkIdx + 1] you directly don't look at your left branch and in extreme cases this helps you not looking half of the branches, which by list.index I can't see this is true (you just look one by one). This is why I claim this is Theta(lgn) instead of O(lgn). Hope this makes sense and helps the community. I would ACK the fact case O(n) does happen when you are trying to look for a big element in this array, and got no idea how to optimize this (but believe there is a way since both Algorithms Illuminated & Intro. to Algs said so) - maybe using some back-tracing? I don't know but I sincerely open for this discussion. |
|
There are some cases where heapIndex() would do fewer comparisons than list.index() and there are cases where it gives no improvement at all. Unlike bisecting a sorted list which always cuts the search space in half, heapIndex() only gets benefit when a particular pattern arises where the target node is smaller than the left child and larger than the right child. This occurs less often than you would expect. I suggest that you run some empirical analysis. Try shuffling a list of 127 integers, heapifying them, doing a search with both heapIndex() and list.index() for every possible value, and counting the number of comparisons. That will show substantially less benefit that you expect. Also, run timeit() so you become aware of how much overhead heapIndex() is adding as compared to list.index(). The latter searches consecutive memory locations, never has to compute child locations, and has no recursion overhead. Also look at heap APIs for other languages to see if you can find any examples of index searches being offered as part of the core API. My quick search did not find any examples. The shortest path algorithm does remove the minimum node, but we already have heappop() to efficiently support that operation. |
|
It cannot be made sub-linear time. An optimal algorithm for searching a heap takes https://dl.acm.org/doi/10.5555/139404.139483 Raymond, this appears to be a part of implementing the increase-key or decrease-key operations in the Wikipedia article you referenced. In graph search algorithms seeking to minimize (or maximize) a function of the path taken, a priority queue is usually used to keep the paths traversed so far ordered by lowest (highest) path value seen so far, or in "A star" by a guaranteed upper (lower) bound on all possible completions of a path so far to a goal node. When expanding the next (best-so-far) node to look at its successors, it's possible that we stumble into a node already on the queue, but with a better actual path-cost (or completion bound) than it had the last time it was encountered. That node can be anywhere in the queue. In effect, we want to remove it from the queue and then immediately re-add it with the new (better) value. Hence "increase-key" or "decrease-key". While we have no code to do so directly, the primitives our heap implementation builds on can be used to delete a node from anywhere in the heap in But this approach isn't a solution to finding the index efficiently either. So it doesn't really address any part of the increase/decrease-key puzzle in a satisfying way. I can't think of a real use for it. BTW, when I've implemented large "A star" algorithms, rather than a heap I've used a |
|
Sam, thank you for the submission, but we're going to decline. Algorithmically, it doesn't do much to help over list.index() and we can't really find much use for it. |
No worries and I'll leave as it is until I have a better solution (probably contribute some new implementations of current heap objects later). Thanks. |
For a given element in a heap, we can leverage the fact that we can
search this element quicker thinking of the property of a heap. Therefore
out of h.index(x) that a list linear search uses, I propose to use a special
written index method to look for an index of a heap element.