python是否可以在对象而不是列表中使用itertools组合库?
例如,我如何在以下数据上使用它?
Rahul - 20,000 - Mumbai
Shivani - 30,000 - Mumbai
Akash - 40,000 - Bangalore
我想要名字和合并工资值的所有可能组合。
我怎么能用combinations
做到这一点?
假设使用pd.read_csv
读取数据并存储。
代码到目前为止 -
import pandas as pd
import itertools
df = pd.read_csv('stack.csv')
print (df)
for L in range(0, len(df)+1):
for subset in itertools.combinations(df['Name'], L):
print (subset)
产量
Name Salary City
0 Rahul 20000 Mumbai
1 Shivani 30000 Mumbai
2 Akash 40000 Bangalore
()
('Rahul',)
('Shivani',)
('Akash',)
('Rahul', 'Shivani')
('Rahul', 'Akash')
('Shivani', 'Akash')
('Rahul', 'Shivani', 'Akash')
Process finished with exit code 0
如何为这些组合添加薪水?
首先,获取您的指数:
idx = [j for i in range(1, len(df) + 1) for j in list(itertools.combinations(df.index, i))]
# [(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]
获取每个组的数据帧:
dfs = [df.iloc[list(i)] for i in idx]
最后,加入和总结:
out = [(', '.join(i.name.values), sum(i.salary.values)) for i in dfs]
输出:
[('Rahul', 20000),
('Shivani', 30000),
('Akash', 40000),
('Rahul, Shivani', 50000),
('Rahul, Akash', 60000),
('Shivani, Akash', 70000),
('Rahul, Shivani, Akash', 90000)]
如果你想将它作为数据帧,那很简单:
df1 = pd.DataFrame(out, columns=['names', 'salaries'])
names salaries
0 Rahul 20000
1 Shivani 30000
2 Akash 40000
3 Rahul, Shivani 50000
4 Rahul, Akash 60000
5 Shivani, Akash 70000
6 Rahul, Shivani, Akash 90000
要查询此数据框以找到与给定薪水最接近的值,我们可以编写一个辅助函数:
def return_closest(val):
return df1.iloc[(df1.salaries - val).abs().idxmin()]
>>> return_closest(55000)
names Rahul, Shivani
salaries 50000
Name: 3, dtype: object
我故意将其打破,这样你才能理解每一步发生的事情。一旦理解,您可以将其组合成一个单行来创建数据帧:
pd.DataFrame(
[(', '.join(d.name.values), sum(d.salary.values))
for i in [j for i in range(1, len(df) + 1)
for j in list(itertools.combinations(df.index, i))]
for d in [df.iloc[list(i)]]], columns=['names', 'salaries']
)
您可以使用zip
同时迭代两列,并使用列表推导来生成输出数据帧,例如:
df_ouput = pd.DataFrame( [[', '.join(subset), sum(salaries)] for L in range(1, len(df)+1)
for subset, salaries in zip(itertools.combinations(df['Name'], L),
itertools.combinations(df['Salary'], L))],
columns = ['Names','Sum Salaries'])
你得到:
Names Sum Salaries
0 Rahul 20000
1 Shivani 30000
2 Akash 40000
3 Rahul, Shivani 50000
4 Rahul, Akash 60000
5 Shivani, Akash 70000
6 Rahul, Shivani, Akash 90000
这样怎么样?
nameList = list()
sumList = list()
for L in range(0, len(df)+1):
for x in itertools.combinations(df['Name'], L):
nameList.append(x)
for y in itertools.combinations(df['Salary'], L):
sumList.append(sum(y))
newDf = pd.DataFrame()
newDf['Names'] = nameList
newDf['Salary Sum'] = sumList
输出:
Names Salary Sum
0 () 0
1 (Rahul,) 20000
2 (Shivani,) 30000
3 (Akash,) 40000
4 (Rahul, Shivani) 50000
5 (Rahul, Akash) 60000
6 (Shivani, Akash) 70000
7 (Rahul, Shivani, Akash) 90000