此问题与此处发布的先前问题有关:
list comprehensions with break
我想如下创建一个熊猫数据框:
0 1 2 3 4 5 6 7 8 9
0 4 5 6 9 10 16 21 23 25 27
1 5 7 7 11 11 17 24 24 26 56
这是我到目前为止编写的代码。
import pandas as pd
import timeit
from bisect import bisect_left
list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]
# List comprehension with []
list3 = [list2[bisect_left(list2,i+1)] for i in list1]
print(list3)
# List comprehension with ()
list3_with_gen = (list2[bisect_left(list2,i+1)] for i in list1)
print(list3_with_gen)
# Timing the []
print(timeit.timeit('''
import timeit
from bisect import bisect_left
list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]
list3 = [list2[bisect_left(list2,i+1)] for i in list1]
'''))
# Timing the ()
print(timeit.timeit('''
import timeit
from bisect import bisect_left
list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]
list3 = (list2[bisect_left(list2,i+1)] for i in list1)
'''))
df = pd.DataFrame([list1, list3])
print(df)
# # original for loops
# list3 = []
# for i in list1:
# for j in list2:
# if j>i:
# # print(i,j)
# list3.append(j)
# break
# # print(list1)
# # print(list3)
代码的输出是:
[5, 7, 7, 11, 11, 17, 24, 24, 26, 56]
<generator object <genexpr> at 0x0000016618C46548>
3.8416419
1.3952507
0 1 2 3 4 5 6 7 8 9
0 4 5 6 9 10 16 21 23 25 27
1 5 7 7 11 11 17 24 24 26 56
代码有什么作用?
它比较创建带有“ []的列表理解”的list3和创建带有“()的列表理解”的list3_with_gen的时间。
“使用()进行列表理解的时间大约快3倍。
我有一些问题,无法完全理解生成器,这并不是因为缺乏尝试。我的目标是尽可能快速高效地创建数据框,因为这是一个很小的示例,并且列表具有较大的维度。
有没有一种方法可以使用list3_with_gen生成器对象创建该数据帧,因为创建它的速度大约快3倍?
此答案未使用生成器,但是其性能在输入列表1和2的长度上是线性的。对于较大的列表,它应该胜过幼稚的理解和对等的理解:
def merge(lst1, lst2):
ret = []
j = 0
for elem in lst1:
while(lst2[j] <= elem):
j += 1
ret.append(lst2[j])
return ret
list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]
print(merge(list1, list2))
打印
[5, 7, 7, 11, 11, 17, 24, 24, 26, 56]
获得请求的DataFrame:
import pandas as pd
def merge(lst1, lst2):
ret = []
j = 0
for elem in lst1:
while(lst2[j] <= elem):
j += 1
ret.append([elem, lst2[j]])
return ret
list1 = [4, 5, 6, 9, 10, 16, 21, 23, 25, 27]
list2 = [1, 3, 5, 7, 8, 11, 12, 13, 14, 15, 17, 20, 24, 26, 56]
merged = merge(list1, list2)
df = pd.DataFrame(zip(*merged))
print(df)
打印
0 1 2 3 4 5 6 7 8 9
0 4 5 6 9 10 16 21 23 25 27
1 5 7 7 11 11 17 24 24 26 56