将数据框分为两组时,您是否可以有多个变量贡献因子?熊猫

问题描述 投票:0回答:3

我正在研究各州的COVID-19死亡病例数,并查看高州人口是否导致捕获COVID-19的人死亡的可能性更高。

当前正在将数据帧分为两类,但是按照我的设置方式,这种拆分将取决于两个因素,而不仅仅是一个因素。 highpopulation_highdeath(这意味着州人口大于中位数,死亡率大于中位数),另一组将为highpopulation_lowdeath(国家人口大于中位数,死亡率小于中位数)。当前代码如下,但是我一直收到无效的语法错误。所以我想知道您是否不能基于两个变量将数据框分为两组?

将deaths_to_case数据集分成两组

highpop_highdeath = df.iloc[(df'StatePopulation' > 4342705.0), (df'deaths_to_cases' > 0.012143070253953211).values]
highpop_highdeath.name = 'States with a high population and high death rate'
highpop_lowdeath = df.iloc[(df'StatePopulation'> 4342705.0), (df'deaths_to_cases' <= 0.012143070253953211).values]
highpop_lowdeath.name = 'States with a high population and low death rate'
python pandas dataframe split
3个回答
0
投票

要在过滤器上组合多个因子,您需要对每个条件使用布尔运算符&

highpop_highdeath = df.loc[(df'StatePopulation' > 4342705.0) & (df'deaths_to_cases' > 0.012143070253953211), :]

0
投票

是的,您可以有两个变量。顺便说一句,您可以分享错误消息吗?另外,尝试一下:

highpop_highdeath = df.iloc[(df['StatePopulation'] > 4342705.0) &  (df['deaths_to_cases'] > 0.012143070253953211)]
highpop_highdeath.name = 'States with a high population and high death rate'
highpop_lowdeath = df.iloc[(df['StatePopulation']> 4342705.0) & (df['deaths_to_cases'] <= 0.012143070253953211)]
highpop_lowdeath.name = 'States with a high population and low death rate'

0
投票

您要组合两个布尔向量。通过这种方式,对于数据框中的每个位置,大熊猫都会评估这两个语句,并且只有当两个条件都为真时,才保留数据。

highpop_highdeath = df.loc[(df'StatePopulation' > 4342705.0) & (df'deaths_to_cases' > 0.012143070253953211)]

ighpop_lowdeath = df.loc[(df'StatePopulation'> 4342705.0) & (df'deaths_to_cases' <= 0.012143070253953211)]

更简洁:

highpop_highdeath_names = df.loc[(df'StatePopulation' > 4342705.0) & (df'deaths_to_cases' > 0.012143070253953211),'name']
© www.soinside.com 2019 - 2024. All rights reserved.