在Python中是否存在带有BST语义的内置数据结构？ [关闭]

Question

[在其他语言中，存在具有二进制搜索树语义的内置数据结构，例如C ++的std::map。

注意，我以C ++为例，但不是因为我试图在C ++本身和Python之间进行直接比较。我已经以“基于观点的观点”对该问题进行了密切投票，但是这个问题没有“基于观点的观点。

通过二进制搜索树的语义，我的意思是：

我可以在亚线性时间内将元素插入结构中，即std :: map的O（logn）
我可以在线性时间内以排序顺序遍历所有元素，即std :: map的O（n）

Python中是否有具有二进制搜索树语义的数据结构？

为了说明，下面的程序可以使用内置的Python数据结构有效地编写吗？

想象一下，我是一家银行的基金经理。银行账户经常被添加到我的资金中。银行帐户具有ID和值。较小的ID表示银行帐户的创建早于具有较大ID的帐户。每次向我的基金中添加一个帐户时，我都想知道我的基金中第1000个最早的帐户的ID号，以作记录。

这里是Python 3和C ++的实现：

from random import randint

dist = lambda: randint(0, 2147483647)

def main():
    my_map = {}

    # fills a map with 1000000 random mappings of type (<int>: <int>)
    for i in range(0, 1000000):
        my_map[dist()] = dist()

    # prints out the 1000th smallest key and its value
    target_key = nth_smallest_key(1000, my_map)
    print("({}: {})".format(target_key, my_map[target_key]))

    # writes a new random mapping to the map
    # then prints out the 1000th smallest key and its value if that key
    # has changed
    # 100000 times
    for i in range(100000):
        my_map[dist()] = dist()

        test_key = nth_smallest_key(1000, my_map)
        if target_key != test_key:
            target_key = test_key
            print("({}: {})".format(target_key, my_map[target_key]))

        # print an indicator every 100 iterations for comparison
        if i % 100 == 0:
            print("iteration: {}".format(i))

def nth_smallest_key(n, m):
    return sorted(m.keys())[n]

if __name__ == "__main__":
    main()

#include <cstdio>
#include <map>
#include <random>
using namespace std;

int nth_smallest_key(int n, map<int, int>& m);

int main() {
    random_device rd;
    mt19937 psrng(rd());
    uniform_int_distribution<> dist(0, 2147483647);

    map<int, int> my_map;

    // fills a map with 1000000 random mappings of type (<int>: <int>)
    for (int i = 0; i < 1000000; i++) {
        my_map[dist(psrng)] = dist(psrng);
    }

    // prints out the 1000th smallest key and its value
    int target_key = nth_smallest_key(1000, my_map);
    printf("(%d: %d)\n", target_key, my_map[target_key]);

    // writes a new random mapping to the map
    // then prints out the 1000th smallest key and its value if that key
    // has changed
    // 100000 times
    for (int i = 0; i <= 100000; i++) {
        my_map[dist(psrng)] = dist(psrng);

        int test_key = nth_smallest_key(1000, my_map);
        if (target_key != test_key) {
            target_key = test_key;
            printf("(%d: %d)\n", target_key, my_map[target_key]);
        }
    }

    return 0;
}

int nth_smallest_key(int n, map<int, int>& m) {
    map<int, int>::iterator curr = m.begin();
    for (int i = 0; i < n; i++) {
        curr = next(curr);
    }
    return curr->first;
}

Makefile：

buildcpp:
        g++ -o main bst_semantics_cpp.cpp

runcpp: buildcpp
        ./main

runpython:
        python3 bst_semantics_python.py

在C ++中，此程序在我的计算机上运行大约5秒钟

$ time ./main
(2211193: 2021141747)
(2208771: 1079444337)
(2208700: 1187119077)
(2208378: 1447743503)
...
(1996019: 1378401665)
(1989217: 1457497754)
(1989042: 1336229409)

real    0m4.915s
user    0m4.750s
sys     0m0.094s
$

但是，在Python中，程序在120秒后仍未达到完成的1％

$ time make runpython
python3 bst_semantics_python.py
(2158070: 1498305903)
iteration: 0
iteration: 100
iteration: 200
^CTraceback (most recent call last):
  File "bst_semantics_python.py", line 36, in <module>
    main()
  File "bst_semantics_python.py", line 23, in main
    test_key = nth_smallest_key(1000, my_map)
  File "bst_semantics_python.py", line 33, in nth_smallest_key
    return sorted(m.keys())[n]
KeyboardInterrupt
Makefile:8: recipe for target 'runpython' failed
make: *** [runpython] Error 1


real    2m2.040s
user    1m59.063s
sys     0m0.375s
$

为了有效地编写此程序，您需要一个具有二进制搜索树语义的数据结构。

Python中是否有这样的数据结构？您可以使用内置的Python数据结构有效地编写此程序吗？

Answer 1

首先，我不会说这是一个苹果与苹果的比较。我不能说您的c ++是否惯用，所以我不会在这里发表评论。我可以说python不是。我在这里看到的最大问题是，重新定义较小键的代码可以以更快的方式完成：

# this is the second for loop, not the first
for i in range(100000):
    k, v = dist(), dist()
    my_map[k] = v

    # this avoids any new iteration through min or sorted
    min_key = min_key if min_key <= k else k

这避免了对您已经[[已经使用min进行迭代的数据结构进行不必要的迭代。

您还有一个额外的if检查要打印在python代码的底部，而不是c ++代码的底部。
[我看到使用python 3.6在没有其他优化的情况下，运行时间为非科学的4秒钟。
为什么要避免排序？不是更快吗？
是的，sorted从技术上讲，用big-O表示更快，但这不是全部。使用sorted，您还需要为创建一个新的数据结构并填充它而付出代价，这似乎会使纸面上的任何收益在实践中消失：
python -m timeit -s "from random import randint; x = [randint(0, 100000) for i in range(10000)]" "min(x)" 2000 loops, best of 5: 126 usec per loop python -m timeit -s "from random import randint; x = [randint(0, 100000) for i in range(10000)]" "sorted(x)" 200 loops, best of 5: 953 usec per loop
[您可能会指出，该测试在某种程度上是随机的，重复测试可能会产生不同的结果，但是我可以告诉您，min在查找数据集合的最小元素时始终如一且明显更快。
实际问题
为什么
python没有与c ++的map类似的映射吗？可能因为这是python而不是c ++。听起来有些昧，在图书馆等方面做得很好的人们要么已经实施了它，而我只是不知道（可能，所以再加一点盐），或者只是不值得时间。如果速度真的很重要，并且您掌握了c ++，那么我不确定使用python会带来什么好处。编辑：
[通读注释，邪恶的绵羊建议使用sortedcontainers库，该库似乎可以满足已排序字典（地图）的要求

Answer 2

Python没有内置的数据结构，无法有效地表达和实现这种类型的程序。

在Python中是否存在带有BST语义的内置数据结构？ [关闭]

问题描述投票：-1回答：2

2个回答

是的，`sorted`从技术上讲，用big-O表示更快，但这不是全部。使用`sorted`，您还需要为创建一个新的数据结构并填充它而付出代价，这似乎会使纸面上的任何收益在实践中消失：

[通读注释，邪恶的绵羊建议使用`sortedcontainers`库，该库似乎可以满足已排序字典（地图）的要求

最新问题

在Python中是否存在带有BST语义的内置数据结构？ [关闭]

问题描述 投票：-1回答：2

2个回答

是的，sorted从技术上讲，用big-O表示更快，但这不是全部。使用sorted，您还需要为创建一个新的数据结构并填充它而付出代价，这似乎会使纸面上的任何收益在实践中消失：

[通读注释，邪恶的绵羊建议使用sortedcontainers库，该库似乎可以满足已排序字典（地图）的要求

最新问题

问题描述投票：-1回答：2

是的，`sorted`从技术上讲，用big-O表示更快，但这不是全部。使用`sorted`，您还需要为创建一个新的数据结构并填充它而付出代价，这似乎会使纸面上的任何收益在实践中消失：

[通读注释，邪恶的绵羊建议使用`sortedcontainers`库，该库似乎可以满足已排序字典（地图）的要求