如何在现有列上创建具有分组值计数和尊重的列[已关闭]

问题描述 投票:0回答:3

我有以下数据表,我想通过在现有列上添加一些条件来获取计数,如果我能得到相同的解决方案,那将是非常有帮助的。

输入:

   Key1    id1-age     id2-age     id3-age    id4-age   id5-age  id1-gender id2-gender   id3-gender    id4-gender    id5-gender
0   a          6          32          61         22       23         M       F               M               F           F
1   b         36          25          52         16       33         M       M               F               F           M
2   c         12          21          45         15       66         F       M               M               M           F

问题陈述

单个密钥作为该特定密钥的个人的多个年龄和性别。年龄 id,&我想要创建列,其中包含 python w.r.t 中每行的年龄组计数。其性别。

预期输出如下:

预期输出:

      Key1  id1-age id2-age id3-age id4-age id5-age  age(02-15)  age(16-21)  age(21-30)  age(31-40) age(41-50)   age(51-60)  age(61+)
0      a     6        32       61     22      23       1            0            2          1        0               0        1
1      b    36        25       52     16      33       0            1            1          2        0               1        0
2      c    12        21       45     15      66       2            1            0          0        1               0        1

我希望我能够对我的问题陈述给出正确的解释。 等待积极回应 预先感谢

python python-3.x count logic
3个回答
2
投票

您可以搜索各列并计算每行的年龄组。然后,计数值可以存储在单独的列表中,这些列表将在遍历每一行后添加到数据帧中。

这是我的方法。这不是最短的代码,还可以改进。

import pandas as pd

df = pd.DataFrame([['a', 6, 32, 61, 22, 23],
                   ['b', 36, 25, 52, 16, 33],
                   ['c', 12, 21, 45, 15, 66],                   
                   ],
                  columns=['Key1', 'id1-age', 'id2-age', 'id3-age', 'id4-age', 'id5-age'])

age_15 = []
age_21 = []
age_30 = []
age_40 = []
age_50 = []
age_60 = []
age_61 = []

for index, record in df.iterrows():
    search_columns = ['id1-age', 'id2-age', 'id3-age', 'id4-age', 'id5-age']
    count_15 = 0
    count_21 = 0
    count_30 = 0
    count_40 = 0
    count_50 = 0
    count_60 = 0
    count_61 = 0
    for search_column in search_columns:
        age = record[search_column]
        if age>=2 and age <= 15:
            count_15 += 1
        elif age>=16 and age <= 21:
            count_21 += 1
        elif age>21 and age <= 30:
            count_30 += 1
        elif age>=31 and age <= 40:
            count_40 += 1
        elif age>=41 and age <= 50:
            count_50 += 1
        elif age>=51 and age <= 60:
            count_60 += 1
        elif age>=61:
            count_61 += 1                
    age_15.append(count_15)
    age_21.append(count_21)
    age_30.append(count_30)
    age_40.append(count_40)
    age_50.append(count_50)
    age_60.append(count_60)
    age_61.append(count_61)

df['age(02-15)'] = age_15
df['age(16-21)'] = age_21
df['age(21-30)'] = age_30
df['age(31-40)'] = age_40
df['age(41-50)'] = age_50
df['age(51-60)'] = age_60
df['age(61+)'] = age_61
print(df[['age(02-15)', 'age(16-21)', 'age(21-30)', 'age(31-40)', 'age(41-50)', 'age(51-60)', 'age(61+)']])

输出:

   age(02-15)  age(16-21)  age(21-30)  age(31-40)  age(41-50)  age(51-60)  age(61+)
0           1           0           2           1           0           0         1
1           0           1           1           2           0           1         0
2           2           1           0           0           1           0         1

1
投票

可能有不太详细的解决方案,但在列中应用条件总和

[1,5)
并将它们分配给新列,如下所示应该有所帮助:

import pandas as pd
df = pd.DataFrame({
  'Key1': ['a', 'b', 'c'],
  'id1-age': [6, 36, 12],
  'id2-age': [32, 25, 12],
  'id3-age': [61, 52, 45],
  'id4-age': [22, 16, 15],
  'id5-age': [23, 33, 66]
})

df['age(02-15)'] = ((df[df.columns[1:5]] >= 2) & (df[df.columns[1:5]] < 15)).sum(1)
df['age(16-21)'] = ((df[df.columns[1:5]] >= 16) & (df[df.columns[1:5]] < 21)).sum(1)
df['age(21-30)'] = ((df[df.columns[1:5]] >= 21) & (df[df.columns[1:5]] < 30)).sum(1)
df['age(31-40)'] = ((df[df.columns[1:5]] >= 31) & (df[df.columns[1:5]] < 40)).sum(1)
df['age(41-50)'] = ((df[df.columns[1:5]] >= 41) & (df[df.columns[1:5]] < 50)).sum(1)
df['age(51-60)'] = ((df[df.columns[1:5]] >= 51) & (df[df.columns[1:5]] < 60)).sum(1)
df['age(61+)'] = (df[df.columns[1:5]] >= 61).sum(1)

print(df)

如果您喜欢列名列表而不是索引范围,则可以将

df.columns[1:5]
替换为
['id1-age', 'id2-age', 'id3-age', 'id4-age', 'id5-age']
,甚至将其定义为变量以避免一遍又一遍地重复。那么,它可能会变成:

import pandas as pd
df = pd.DataFrame({
  'Key1': ['a', 'b', 'c'],
  'id1-age': [6, 36, 12],
  'id2-age': [32, 25, 12],
  'id3-age': [61, 52, 45],
  'id4-age': [22, 16, 15],
  'id5-age': [23, 33, 66]
})

range_cols = df[['id1-age', 'id2-age', 'id3-age', 'id4-age', 'id5-age']]

df['age(02-15)'] = ((range_cols >= 2) & (range_cols < 15)).sum(1)
df['age(16-21)'] = ((range_cols >= 16) & (range_cols < 21)).sum(1)
df['age(21-30)'] = ((range_cols >= 21) & (range_cols < 30)).sum(1)
df['age(31-40)'] = ((range_cols >= 31) & (range_cols < 40)).sum(1)
df['age(41-50)'] = ((range_cols >= 41) & (range_cols < 50)).sum(1)
df['age(51-60)'] = ((range_cols >= 51) & (range_cols < 60)).sum(1)
df['age(61+)'] = (range_cols >= 61).sum(1)

print(df)

0
投票

你可以使用

pandas.cut()
鉴于您的数据框称为 df ,就像这样

df.apply(lambda r : pd.cut(r,[15,21,31,41,61,1000]).value_counts() , axis = 1)

然后合并数据框

© www.soinside.com 2019 - 2024. All rights reserved.