定义最佳网络的最快最小化算法

Question

我有一大群客户，每个人都有一个 id。它们通过“值”列相互关联，如下所示：

import pandas as pd
import numpy as np

# Example with 7 clients
df = pd.DataFrame(np.random.randint(7, size=(20, 2)), columns=['start id', 'end id'])
df = df[df['start id'] != df['end id']]
df = df.drop_duplicates()
df['Value'] = np.random.uniform(low=0.0, high=1.0, size=len(df))
print(df)

	开始ID	结束ID	价值
0	1	4	0.315743
1	4	3	0.449567
2	4	2	0.945336
3	3	0	0.556950
5	3	4	0.002412
6	2	1	0.976020
8	4	0	0.480784
9	4	1	0.798300

分数是“start id”上的值的聚合：

print(df.groupby('start id').agg(Score=('Value', 'sum')))

开始ID	分数
1	0.315743
2	0.976020
3	0.559362
4	2.673987

我的目标是删除最少数量的客户端，使每个客户端的分数低于阈值。我们假设阈值是 0.7。去掉2和4就够了，最高分是0.55695:

df = df[~df['start id'].isin([2,4])]
df = df[~df['end id'].isin([2,4])]
print(df)

	开始ID	结束ID	价值
3	3	0	0.55695

df_2 = df.groupby('start id').agg(Score=('Value', 'sum'))
print(df_2['Score'].max())
0.55695

我有数千个已连接的客户端，并且采用暴力方法（删除每种可能的组合是不可行的），您会推荐什么优化算法？

Answer 1

我认为你想要做的是找出哪些 id 将生成高于阈值的分数，然后从原始数据帧中擦除涉及这些 id 的行（作为开始和结束），然后重新进行分组。

threshold = 0.7

# Pre-compute group-by
grouped_df = df.groupby('start id').agg(Score=('Value', 'sum'))

# Isolate ids to remove
ids_to_remove = grouped_df[grouped_df["Score"] > threshold].index

# Edit the dataframe
df = df[(~df['start id'].isin(ids_to_remove))&(~df['end id'].isin(ids_to_remove))]

# Compute final group-by
grouped_df = df.groupby('start id').agg(Score=('Value', 'sum'))

print(grouped_df['Score'].max())
>> 0.55695

定义最佳网络的最快最小化算法

问题描述投票：0回答：1

1个回答

最新问题

定义最佳网络的最快最小化算法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1