我的代码对于不同的数据集返回相同的输出（分组总和和总数的百分比）？

Question

所以我正在使用的数据库是私有的，但它由农场数据组成，数据框（我称之为 Pome）如下所示：在此输入图片描述我想为每个新商品输入一个新的 csv - 因为在下载主要数据时会预先过滤该文件。然后我想按“B_Desc”列（农场描述的边界）进行分组。最后我想要每个边界使用的总电量的百分比。

然而，当我输入新的 CSV 文件（不同商品的）时，我的代码按总和和总数的百分比返回相同的组。

这是我的代码：（当然，一开始我导入 pandas 和 numpy 并读取我的 csv 文件）

# Split the column into separate columns & new column names
Pome = Pome['Total Input Description;Primary Value;B_Desc;Boundary Value;UOM'].str.split(';', expand=True)
Pome.columns = ['Total Input Description', 'Primary Value', 'B_Desc', 'Boundary Value', 'UOM']


# Replace non-numeric values with NaN
Pome['Boundary Value'] = pd.to_numeric(Pome['Boundary Value'], errors='coerce')
# Replace NaN with a default value (e.g., 0)
Pome['Boundary Value'] = Pome['Boundary Value'].fillna(0)
# Convert to integers
Pome['Boundary Value'] = Pome['Boundary Value'].astype(int)

# Filter the DataFrame to include only rows with 'Total GRID Electricity' in 'Total Input Description'
pome_grid_electricity = Pome[Pome['Total Input Description'].str.contains('Total GRID Electricity', case=False)]

# Group the filtered DataFrame by 'B_Desc' and calculate the sum of 'Boundary Value'
sum_boundary_value = pome_grid_electricity.groupby('B_Desc')['Boundary Value'].sum().reset_index()

# Calculate the total of all groups
total_boundary_value = sum_boundary_value['Boundary Value'].sum()

# Calculate the percentage of each group relative to the total
sum_boundary_value['Percentage of Total'] = (sum_boundary_value['Boundary Value'] / total_boundary_value) * 100

# Rename the 'Boundary Value' column in the resulting DataFrame
sum_boundary_value = sum_boundary_value.rename(columns={'Boundary Value': 'Sum Boundary Value'})

# Print 
print(sum_boundary_value)

因此，在运行不同的数据集时，我期望不同的边界总数和百分比，但它保持完全相同。

如有任何反馈，我们将不胜感激<3

Answer 1

您可以在代码中放入一些打印语句来调试并查看发生了什么，或者如果可以的话，使用已知数据制作一些示例 csv 文件，这些文件应该产生不同的结果，这可以帮助您隔离问题是否来自代码或数据。

我的代码对于不同的数据集返回相同的输出（分组总和和总数的百分比）？

问题描述投票：0回答：1

1个回答

最新问题

我的代码对于不同的数据集返回相同的输出（分组总和和总数的百分比）？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1