在 pandas 数据框中生成组合

问题描述 投票:0回答:1

我有一个包含 [“Uni”、“Region”、“Profession”、“Level_Edu”、“Financial_Base”、“Learning_Time”、“GENDER”] 列的数据集。 [“Uni”、“Region”、“Profession”] 中的所有值均已填充,而 [“Level_Edu”、“Financial_Base”、“Learning_Time”、“GENDER”] 始终包含 NA。

对于带有 NA 的每一列,有几个可能的值

Level_Edu = ['undergrad', 'grad', 'PhD']
Financial_Base = ['personal', 'grant']
Learning_Time = ["morning", "day", "evening"]
GENDER = ['Male', 'Female']

我想为初始数据中的每个观察生成 [“Level_Edu”、“Financial_Base”、“Learning_Time”、“GENDER”] 的所有可能组合。这样每个初始观测值将由 36 个新观测值表示(通过组合数学公式获得:N1 * N2 * N3 * N4,其中 Ni 是列可能值的第 i 个向量的长度)

这是一个 Python 代码,用于重新创建两个初始观察值和我想要获得的结果的近似值(对于我想要的每个初始观察值,显示 36 种组合中的 3 种组合)。

import pandas as pd
import numpy as np
sample_data_as_is = pd.DataFrame([["X1", "Y1", "Z1", np.nan, np.nan, np.nan, np.nan], ["X2", "Y2", "Z2", np.nan, np.nan, np.nan, np.nan]], columns=["Uni", 'Region', "Profession", "Level_Edu", 'Financial_Base', 'Learning_Time', 'GENDER'])

sample_data_to_be = pd.DataFrame([["X1", "Y1", "Z1", "undergrad", "personal", "morning", 'Male'], ["X2", "Y2", "Z2", "undergrad", "personal", "morning", 'Male'],
                                  ["X1", "Y1", "Z1", "grad", "personal", "morning", 'Male'], ["X2", "Y2", "Z2", "grad", "personal", "morning", 'Male'],
                                  ["X1", "Y1", "Z1", "undergrad", "grant", "morning", 'Male'], ["X2", "Y2", "Z2", "undergrad", "grant", "morning", 'Male']], columns=["Uni", 'Region', "Profession", "Level_Edu", 'Financial_Base', 'Learning_Time', 'GENDER'])

python pandas numpy combinatorics
1个回答
1
投票

您可以组合

itertools.product
和十字
merge
:

from itertools import product

data = {'Level_Edu': ['undergrad', 'grad', 'PhD'],
        'Financial_Base': ['personal', 'grant'],
        'Learning_Time': ['morning', 'day', 'evening'],
        'GENDER': ['Male', 'Female']}

out = (sample_data_as_is[['Uni', 'Region', 'Profession']]
       .merge(pd.DataFrame(product(*data.values()), columns=data.keys()), how='cross')
      )

输出:

   Uni Region Profession  Level_Edu Financial_Base Learning_Time  GENDER
0   X1     Y1         Z1  undergrad       personal       morning    Male
1   X1     Y1         Z1  undergrad       personal       morning  Female
2   X1     Y1         Z1  undergrad       personal           day    Male
3   X1     Y1         Z1  undergrad       personal           day  Female
4   X1     Y1         Z1  undergrad       personal       evening    Male
..  ..    ...        ...        ...            ...           ...     ...
67  X2     Y2         Z2        PhD          grant       morning  Female
68  X2     Y2         Z2        PhD          grant           day    Male
69  X2     Y2         Z2        PhD          grant           day  Female
70  X2     Y2         Z2        PhD          grant       evening    Male
71  X2     Y2         Z2        PhD          grant       evening  Female

[72 rows x 7 columns]
© www.soinside.com 2019 - 2024. All rights reserved.