使用发生器而不是列表创建熊猫数据框以提高性能

问题描述 投票:0回答:1

此问题与此处发布的先前问题有关:

list comprehensions with break

我想如下创建一个熊猫数据框:

   0  1  2   3   4   5   6   7   8   9
0  4  5  6   9  10  16  21  23  25  27
1  5  7  7  11  11  17  24  24  26  56

这是我到目前为止编写的代码。

import pandas as pd
import timeit
from bisect import bisect_left

list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]

# List comprehension with []
list3 = [list2[bisect_left(list2,i+1)] for i in list1]
print(list3)

# List comprehension with ()
list3_with_gen = (list2[bisect_left(list2,i+1)] for i in list1)
print(list3_with_gen)

# Timing the []
print(timeit.timeit('''

import timeit
from bisect import bisect_left

list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]

list3 = [list2[bisect_left(list2,i+1)] for i in list1]

'''))

# Timing the ()
print(timeit.timeit('''

import timeit
from bisect import bisect_left

list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]

list3 = (list2[bisect_left(list2,i+1)] for i in list1)

'''))

df = pd.DataFrame([list1, list3])
print(df)



# # original for loops
# list3 = []
# for i in list1:
#     for j in list2:
#         if j>i:
#             # print(i,j)
#             list3.append(j)
#             break
# # print(list1)
# # print(list3)

代码的输出是:

[5, 7, 7, 11, 11, 17, 24, 24, 26, 56]
<generator object <genexpr> at 0x0000016618C46548>
3.8416419
1.3952507
   0  1  2   3   4   5   6   7   8   9
0  4  5  6   9  10  16  21  23  25  27
1  5  7  7  11  11  17  24  24  26  56

代码有什么作用?

  • 它比较创建带有“ []的列表理解”的list3和创建带有“()的列表理解”的list3_with_gen的时间。

  • “使用()进行列表理解的时间大约快3倍。

我有一些问题,无法完全理解生成器,这并不是因为缺乏尝试。我的目标是尽可能快速高效地创建数据框,因为这是一个很小的示例,并且列表具有较大的维度。

有没有一种方法可以使用list3_with_gen生成器对象创建该数据帧,因为创建它的速度大约快3倍?

python pandas dataframe generator list-comprehension
1个回答
0
投票

此答案未使用生成器,但是其性能在输入列表1和2的长度上是线性的。对于较大的列表,它应该胜过幼稚的理解和对等的理解:

def merge(lst1, lst2):
    ret = []
    j = 0
    for elem in lst1:
        while(lst2[j] <= elem):
            j += 1
        ret.append(lst2[j])
    return ret

list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]

print(merge(list1, list2))

打印

[5, 7, 7, 11, 11, 17, 24, 24, 26, 56]

获得请求的DataFrame:

import pandas as pd

def merge(lst1, lst2):
    ret = []
    j = 0
    for elem in lst1:
        while(lst2[j] <= elem):
            j += 1
        ret.append([elem, lst2[j]])
    return ret

list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]

merged = merge(list1, list2)
df = pd.DataFrame(zip(*merged))
print(df)

打印

   0  1  2   3   4   5   6   7   8   9
0  4  5  6   9  10  16  21  23  25  27
1  5  7  7  11  11  17  24  24  26  56
© www.soinside.com 2019 - 2024. All rights reserved.