年龄范围以数值计算 CD 消费量与年龄范围的相关性

Question

我确实对值进行了排序。但问题是'до 25'（最多25）。我如何将其更改为“0-25”并计算年龄组和总体评分的相关系数。

我的一些数据如下

年龄段	总体评价
65岁以上	38.45
55-64	17.66
最多25个	46.56
45-54	24.95
35-44	33.54
25-34	37.21

Answer 1

以下是您如何按照您的要求进行操作。我将您的年龄类别转换为平均年龄，因为相关性需要两个数值；类别不适用于相关性。您的数据还存在一些其他问题。目前还不清楚 65 岁及以上人群的实际人数。我的成绩是 65-100，但情况可能并非如此。例如，您还可以将类别设置为 25-34。它应该是 25-35，因为 25-35 不包含 35，它包含 25、26、27、28、29、30、31、32、33 和 34，这就是我认为您想要实现的目标。我没有改变这一点，但如果这是你想要实现的目标，你应该改变它。

import pandas as pd
from scipy.stats import pearsonr
import warnings
warnings.filterwarnings("ignore")

Agelst=['65 and older','55-64','up to 25','45-54','35-44','25-34']
Ratelst=[38.45,17.66,46.56,24.95,33.54,37.21]

df=pd.DataFrame()
df['Age_Group']=Agelst
df['Overal_Rating']=Ratelst

display(df)

#Change 'up to 25' to '0-25'
df.replace('up to 25', '0-25',inplace=True)
df.replace('65 and older', '65-100',inplace=True)

display(df)

#You will need a numeric age to use for correlation.  We can develop one from the strings in your 'Age_Group'
loweragelst=[]
upperagelst=[]
for i in range(len(df)):
    loweragelst.append(int(((df.iloc[i]['Age_Group']).split('-'))[0]))
    upperagelst.append(int(((df.iloc[i]['Age_Group']).split('-'))[1]))

df['Lower_Age']=loweragelst
df['Upper_Age']=upperagelst

#Sort the df
df.sort_values(by=['Lower_Age'], ascending=True,inplace=True)
display(df)

#Add a mean age column to use for correlation
df['Mean_Age']=(df['Lower_Age']+df['Upper_Age'])/2

display(df)

#Calculate Pearson's Correlation
X=df['Mean_Age']
Y=df['Overal_Rating']
PCor= pearsonr(X, Y)
print(PCor)

年龄范围以数值计算 CD 消费量与年龄范围的相关性

问题描述投票：0回答：1

1个回答

最新问题

年龄范围以数值计算 CD 消费量与年龄范围的相关性

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1