获取某个年龄段的平均年龄和中位年龄

问题描述 投票:0回答:1

来自以下人群:

0-9 10-19   20-29   30-39   40-49   50-59   60-69   70-79   80-89   90-99   100-109 110-119

男 -692 -772 -741 -698 -707 -511 -371 -203 -95 -17 -8 -1.0 女 676 771 808 865 815 581 400 226 102 15 8 0.0

我想绘制一个年龄金字塔,并计算人口年龄类别的平均值和中位数

我可以使用下面的 pandas 数据框手动获取金字塔

age_p = pd.DataFrame({'年龄': [ '100+', '90-99', '80-89', '70-79', '60-69', '50-59', '40 -49', '30-39', '20-29', '10-19', '0-9'], '男': [-9, -17, -95, -203, -371, -511, -707, -698, -741, -772, -692], ‘女性’: [8, 15, 102, 226, 400, 581, 815, 865, 808, 771, 676]})

年龄类别 = ['100+', '90-99', '80-89', '70-79', '60-69', '50-59', '40-49', '30-39' , '20-29', '10-19', '0-9']

我觉得这可以少一点手动,多一点Python风格

pandas numpy matplotlib group-by seaborn
1个回答
0
投票

在第一部分中构建年龄金字塔。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

age_classes = ['100+', '90-99', '80-89', '70-79', '60-69', '50-59', '40-49', '30-39', '20-29', '10-19', '0-9']
male_counts = [-9, -17, -95, -203, -371, -511, -707, -698, -741, -772, -692]
female_counts = [8, 15, 102, 226, 400, 581, 815, 865, 808, 771, 676]

age_p = pd.DataFrame({'Age': age_classes, 'Male': male_counts, 'Female': female_counts})

fig, ax = plt.subplots()
index = np.arange(len(age_classes))
bar_width = 0.35

male_bars = ax.barh(index - bar_width/2, male_counts, bar_width, label='Male')
female_bars = ax.barh(index - bar_width/2, female_counts, bar_width, label='Female')

ax.set_xlabel('Population Count')
ax.set_ylabel('Age')
ax.set_yticks(index)
ax.set_yticklabels(age_classes)
ax.legend()

plt.title('Age Pyramid')
plt.show()

在第二部分中,我们需要计算数据框的平均值和中位年龄。然而,在

Male Column
中,我们有一个负值,我们必须通过绝对值才能获得正确的值,而其他问题是“100+”,因此创建 if 语句来处理该问题。因此,计算中位数和平均年龄的代码的最终版本如下所示。希望能帮助你理解。

age_values = []
for age_class in age_classes:
    if '+' in age_class:
        age_values.append(float(age_class[:-1]) + 0.5)  # Handling '100+' as a special case
    else:
        age_range = age_class.split('-')
        age_values.append((int(age_range[0]) + int(age_range[1])) / 2)

age_counts = np.abs(np.array(male_counts)) + np.array(female_counts) 

mean_age_class = np.average(age_values, weights=age_counts)
median_age_class = np.median(np.repeat(age_values, np.abs(age_counts)))
© www.soinside.com 2019 - 2024. All rights reserved.