我有一些看起来像这样的数据:
allgroups = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
我想计算组之间和组内部的平方和,但是我不知道如何执行此操作。我已经计算出均值,方差和均值。据我所知
sswithin = sum((variance-1)*size of each list)
ssbetween = sum(((mean-grandmean)**2)*size of each list)
这是我的代码:
def avg(allgroups): #Average List
return [float(sum(i))/len(i) for i in allgroups]
def variance(allgroups): #Variance List
return [sum((x - sum(group) / len(group)) ** 2 for x in group) / (len(group) - 1) for group in allgroups]
def calcavg(allgroups): #Grand Average
return float(sum(avg(allgroups)) / len(avg(allgroups)))
def size(allgroups): #Size of the sameples in list
return [len(group) for group in allgroups]
TheAvg=avg(allgroups)
print(TheAvg)
Variance=variance(allgroups)
print(Variance)
calcAvg=calcavg(allgroups)
print(calcAvg)
sizeSample=size(allgroups)
print(sizeSample)
我将不胜感激。P / S:对于该问题,我无法使用任何库,例如numpy或statistic。
平方和定义为残差平方和
并且为了获得样本方差,我们将上述值除以n-1。下面我们也将此称为均方误差。
因此,我们使用上面的平均值,使用ybar来计算平方总和,并且可以将其划分为解释的平方和(ESS)和残差的平方和(RSS)
ESS也是您在SS组内的(因此进行了说明),并且组之间的差异是RSS。因此,我们可以使用类似于您拥有的功能的东西:
def avg(group): #Average List
return float(sum(group))/len(group)
def sum_of_squares(group):
mean = avg(group)
return sum([(i-mean)**2 for i in group])
flat_list = [y for x in allgroups for y in x]
TSS = sum_of_squares(flat_list)
RSS = sum([sum_of_squares(i) for i in allgroups])
ESS = TSS - RSS
平均平方和是平方和除以自由度,即n-1:
MS_explained = ESS/(len(allgroups)-1)
MS_residuals = RSS/(len(flat_list)-len(allgroups))
[MS_explained,MS_residuals]
[45.0, 1.0]
我想您想要的组内方差是MS_explained,而两者之间是MS_residuals
我们也可以使用statsmodel交叉检查:
import statsmodels.api as sm
from statsmodels.formula.api import ols
import numpy as np
data = pd.DataFrame({'group':np.repeat(np.arange(4),3),'value':flat_list})
moore_lm = ols('value ~ C(group)',data=data).fit()
sm.stats.anova_lm(moore_lm, typ=1)
df sum_sq mean_sq F PR(>F)
C(group) 3.0 135.0 45.0 45.0 0.000024
Residual 8.0 8.0 1.0 NaN NaN