在Python中找出具有一定长度的列表数量

Question

给出以下列表清单

arrayNumbers = [[32,3154,53,13],[44,34,25,67], [687,346,75], [57,154]]

如何才能有效地获得只有4个项目的列表数量？

在这种情况下，这将是arrayNumbers_len = 2。我可以使用循环来做到这一点，但这根本不是有效的。由于我的真实阵列的长度是数百万，我需要一种方法来非常快速地完成这项工作。

这是我目前的解决方案：

batchSize = 4
counter = 0
for i in range(len(arrayNumbers)):
    if (len(arrayNumbers[i]) == batchSize):
        counter += 1

有什么建议？

Answer 1

我继续并做了一些时间来展示这些不同的方法是如何变化的。

Note: These are in Python 3:

注意：var_arr有一百万个随机大小的子列表：

In [31]: def for_loop(var_arr, batchsize):
    ...:     count = 0
    ...:     for x in var_arr:
    ...:         if len(x) == batchsize:
    ...:             count += 1
    ...:     return count
    ...:

In [32]: def with_map_count(var_arr, batchsize):
    ...:     return list(map(len, var_arr)).count(batchsize)
    ...:

In [33]: def lambda_filter(var_arr, batchsize):
    ...:     len(list(filter(lambda l: len(l) == batchsize, var_arr)))
    ...:

In [34]: def sum_gen(var_arr, batchsize):
    ...:     sum(len(x) == batchsize for x in var_arr)
    ...:

In [35]: from collections import Counter
    ...: def with_counter(var_arr, batchsize):
    ...:     Counter(map(len, var_arr)).get(batchsize, 0)
    ...:

In [36]: %timeit for_loop(var_arr, 4)
82.9 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [37]: %timeit with_map_count(var_arr, 4)
48 ms ± 873 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [38]: %timeit lambda_filter(var_arr, 4)
172 ms ± 3.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [39]: %timeit sum_gen(var_arr, 4)
150 ms ± 3.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [40]: %timeit with_counter(var_arr, 4)
75.6 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

更多时间：

In [50]: def with_list_comp_filter(var_arr, batchsize):
    ...:     return len([x for x in var_arr if len(x) == batchsize])
    ...:
    ...: def with_list_comp_filter_map(var_arr, batchsize):
    ...:     return len([x for x in map(len, var_arr) if x == batchsize])
    ...:
    ...: def loop_with_map(var_arr, batchsize):
    ...:     count = 0
    ...:     for x in map(len, var_arr):
    ...:         count += x == batchsize
    ...:     return count
    ...:

In [51]: %timeit with_list_comp_filter(var_arr, 4)
87.8 ms ± 4.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [52]: %timeit with_list_comp_filter_map(var_arr, 4)
62.7 ms ± 1.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [53]: %timeit loop_with_map(var_arr, 4)
91.9 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Answer 2

抱歉，只是在原始信息科学术语中，您遇到了O（N）问题，其中N是列表中元素的数量。你必须访问每个长度来测试它对batchSize。然而，有了它，我们可以将它填充到一个单行程中，使Python有机会尽可能地进行优化：

map(len, arraynumbers).count(4)

Answer 3

这是否可以接受？

len([l for l in arrayNumbers if len(l) == 4])

如果这仍然太慢，您可以用C或C ++编写算法，并从Python代码中调用它。详情请见此处：https://docs.python.org/3.6/extending/extending.html

Answer 4

我在python 2中运行了自己的测试，看起来列表理解（@ DBedrenko的更新解决方案）是@ Prune的map(len, arraynumbers).count(4)排在第二位的最快：

nLists = 1000000
arrayNumbers = [[np.random.randint(0, 10)]*np.random.randint(0, 10) for _ in range(nLists)]
batchSize = 4
In [67]:

%%timeit
counter = 0
for i in range(len(arrayNumbers)):
    if (len(arrayNumbers[i]) == batchSize):
        counter += 1
10 loops, best of 3: 108 ms per loop
In [68]:

%%timeit
map(len, arrayNumbers).count(4)
10 loops, best of 3: 65.7 ms per loop
In [69]:

%%timeit
len(list(filter(lambda l: len(l) == 4, arrayNumbers)))
10 loops, best of 3: 121 ms per loop
In [70]:

%%timeit
len([l for l in arrayNumbers if len(l) == 4])
10 loops, best of 3: 58.6 ms per loop
In [71]:

%%timeit
sum(len(i)==4 for i in arrayNumbers)
10 loops, best of 3: 97.8 ms per loop

Answer 5

这是一个numpy解决方案。它只是略微慢于非numpy答案的最佳。一个优点是，除非存在可笑的大型子列表，否则可以以最小的额外成本获得所有长度的计数：

>>> import numpy as np
>>> from timeit import timeit
>>> from collections import Counter
>>> 
>>> lengths = np.random.randint(0, 100, (100_000))
>>> lists = [l * ['x'] for l in lengths]
>>> 
>>> 
# count one
# best Python
>>> list(map(len, lists)).count(16)
974
# numpy
>>> np.count_nonzero(16==np.fromiter(map(len, lists), int, len(lists)))
974
>>> 
# count all
# best Python
>>> [cc for c, cc in sorted(Counter(map(len, lists)).items())]
[973, 1007, 951, 962, 1039, 962, 1028, 999, 970, 997,
 ... 
 1039, 997, 976, 1028, 1026, 969, 1106, 994, 1002, 1022]
>>> 
# numpy
>>> np.bincount(np.fromiter(map(len, lists), int, len(lists)))
array([ 973, 1007,  951,  962, 1039,  962, 1028,  999,  970,  997,
       ...
       1039,  997,  976, 1028, 1026,  969, 1106,  994, 1002, 1022])

时序：

>>> kwds = dict(globals=globals(), number=100)
>>> 
>>> timeit('list(map(len, lists)).count(16)', **kwds)
0.38265155197586864
>>> timeit('np.count_nonzero(16==np.fromiter(map(len, lists), int, len(lists)))', **kwds)
0.4332483590114862
>>> 
>>> timeit('Counter(map(len, lists))', **kwds)
0.673214758047834
>>> timeit('np.bincount(np.fromiter(map(len, lists), int, len(lists)))', **kwds)
0.43800772598478943

Answer 6

使用带有匿名函数的filter：

>>> Numbers = [[32,3154,53,13],[44,34,25,67],[687,346,75],[57,154]]
>>> filter(lambda x: len(x) == 4, Numbers)
[[32, 3154, 53, 13], [44, 34, 25, 67]]
>>> len(filter(lambda x: len(x) == 4, Numbers))
2

在Python中找出具有一定长度的列表数量

问题描述投票：3回答：6

6个回答

Note: These are in Python 3:

最新问题

在Python中找出具有一定长度的列表数量

问题描述 投票：3回答：6

6个回答

Note: These are in Python 3:

最新问题

问题描述投票：3回答：6