为什么我的 numpy 均值结果与 excel 计算均值不同？

Question

因此，我的数据集（又名“filtered_test.xlsx”）包含有关美国某些酒店的信息数据集，其中包含评级等内容。所以我想计算按酒店名称分组的平均评分。我已经在“filtered_test.xlsx”文件中删除了具有 0 和“NaN”值的变量。我的代码如下：

   import numpy as np
   import pandas as pd
   import matplotlib.pyplot as plt
   import seaborn as sns 

   from matplotlib import rcParams #for chart


   df = pd.read_excel("filtered_test.xlsx")

   #Creates the average hotel ratings by hotel name
   hotel_rating = df.groupby('name') ['reviews.rating'].mean().reset_index()
   hotel_rating.to_excel("hotel_ratings.xlsx", index = False)
   hotelRating_df = pd.read_excel("hotel_ratings.xlsx")

基本上，当我使用 excel 计算它们时，我的 numpy 平均值并不对应于其正确值。

一个例子：

这张图片显示一家酒店有一个值，即 4。那么，我的 hotel_ ratings.xlsx 中这家酒店“Extend Stay America”的平均酒店评级不应该是 4 吗？但在 hotel_ ratings.xlsx 中，这家酒店的平均值是与应有的不一样...

我该如何解决这个问题......

Answer 1

Excel 中应用的过滤器不会转移到 Python。也许您正在寻找的东西更接近：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 

from matplotlib import rcParams #for chart


df = pd.read_excel("filtered_test.xlsx")
df.fillna(-1,inplace=True)#Replace null values with -1
valid_ratings = df[df['reviews.rating']>-1] #Filter out rows that have their rating as -1
#if you want to filter out zeros, just change fillna and valid_ratings to use 0

#Creates the average hotel ratings by hotel name
hotel_rating = valid_ratings.groupby('name') ['reviews.rating'].mean().reset_index()
hotel_rating.to_excel("hotel_ratings.xlsx", index = False)
hotelRating_df = pd.read_excel("hotel_ratings.xlsx")

为什么我的 numpy 均值结果与 excel 计算均值不同？

问题描述投票：0回答：1

1个回答

最新问题

为什么我的 numpy 均值结果与 excel 计算均值不同？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1