如何区分 pandas dataframe 中的项目值

Question

我有这个 DataFrame

指数	N1	氮气	N3	N4	N5	时间	CountN1	CountN2	CountN3	CountN4	CountN5	结果N2	结果N4	结果N5	RhoN1	RhoN2	RhoN3	RhoN4
0	巧克力	糖	牛奶	鸡蛋	面粉	1	1	1	1	1	1	0.0	0.0	0.0	1.4142135623730951	1.4142135623730951	1.4142135623730951	1.4142135623730951
1	面包	披萨	汽水	水	电池	2	1	1	1	1	1	0.0	0.0	0.0	2.23606797749979	2.23606797749979	2.23606797749979	2.23606797749979
2	植物	茶	咖啡	香肠	意大利面	3	1	1	1	1	1	0.0	0.0	0.0	3.1622776601683795	3.1622776601683795	3.1622776601683795	3.1622776601683795
3	西红柿	面包	奶酪	意大利面	汽水	4	1	2	1	2	2	2.0	1.0	2.0	4.123105625617661	4.898979485566356	4.123105625617661	4.58257569495584
4	大蒜	洋葱	米饭	培根	水	5	1	1	1	1	2	0.0	0.0	3.0	5.0990195135927845	5.0990195135927845	5.0990195135927845	5.0990195135927845

所以N列是顾客购买的物品，time是连续排序的时间，CountN列是累计购买的物品，resultN是同一物品从一个顾客到另一个顾客的时间间隔，RhoN列是角度

我想要的只是 RhoN1_diff、RhoN2_diff、RhoN3_diff、RhoN4_diff、RhoN5_diff 列，它们给出了 daframe 中每个项目的 Rho 列差异。例如，面包在时间 2 的 rho 值为 2.23606797749979，在时间 4 的 rho 值为 4.898979485566356。诀窍是像面包这样的项目可以在任何 N 列中，每次只出现一次。

相信我，聊天 gpt 还没有准备好取代我们。

如果您需要更多详细信息，请告诉我。

谢谢。

Answer 1

几个小时后，我终于找到了答案。如果这可以为面临类似问题的任何人腾出一些时间。这是代码：

# Melt the DataFrame to create a more convenient format
melted_df = pd.melt(new_df, id_vars=['time'], value_vars=['N1', 'N2', 'N3', 'N4', 'N5'], var_name='N', value_name='item')

# Merge the melted DataFrame with the corresponding RhoN columns
rho_columns = ['time', 'RhoN1', 'RhoN2', 'RhoN3', 'RhoN4', 'RhoN5']
melted_df = melted_df.merge(new_df[rho_columns], on='time')

# Create a new column containing the corresponding RhoN values for each item
melted_df['RhoN'] = melted_df.apply(lambda x: x[f"RhoN{x['N'][1:]}"], axis=1)

# Drop unnecessary columns
melted_df.drop(columns=['RhoN1', 'RhoN2', 'RhoN3', 'RhoN4', 'RhoN5'], inplace=True)

# Calculate the RhoN differences
melted_df = melted_df.sort_values(['item', 'time'])
melted_df['RhoN_diff'] = melted_df.groupby('item')['RhoN'].diff()

# Pivot the table back to the original format
pivoted_df = melted_df.pivot_table(index='time', columns='N', values='RhoN_diff', aggfunc='first')
pivoted_df.columns = [f'Rho{col}_diff' for col in pivoted_df.columns]

# Fill NaN values with zeros
pivoted_df.fillna(0, inplace=True)

# Set the index of both DataFrames to be the `time` column
new_df.set_index('time', inplace=True)
pivoted_df.reset_index(inplace=True)
pivoted_df.set_index('time', inplace=True)


# Merge the pivoted table back with the original DataFrame using an 'outer' merge
final_df = new_df.merge(pivoted_df, left_index=True, right_index=True, how='outer')
# Reset the index of the final DataFrame
final_df.reset_index(inplace=True)

# Fill NaN values with zeros in the final DataFrame
final_df.fillna(0, inplace=True)

如果您需要更多细节或解释，请告诉我。

如何区分 pandas dataframe 中的项目值

问题描述投票：0回答：1

1个回答

最新问题

如何区分 pandas dataframe 中的项目值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1