迭代后创建分配给数据帧的多个数据透视表

问题描述 投票:0回答:1

样本数据集:

  ID  1  2  3  X  Y  Z
0   1  2  1  2  3  3  4
1   2  1  3  1  4  3  4
2   3  2  2  1  2  4  3
3   4  3  2  1  2  3  3
4   5  1  2  2  1  3  2
5   6  2  3  2  4  4  2
​
cross1 = pd.crosstab(sample["1"], org1_df["X"])
cross2 = pd.crosstab(sample["2"], org1_df["X"])
cross3 = pd.crosstab(sample["3"], org1_df["X"])
cross4 = pd.crosstab(sample["1"], org1_df["Y"])
cross5 = pd.crosstab(sample["2"], org1_df["Y"])
cross6 = pd.crosstab(sample["3"], org1_df["Y"])
cross7 = pd.crosstab(sample["1"], org1_df["Z"])
etc.

我想循环执行此代码,用新列(“列 2”和“列 Y”)替换“列 1”和“列 X”,以生成新的交叉表并将该交叉表分配给新的数据框。手动操作一次即可,非常简单。这按类别(在本例中为业务类型)提供了调查问题的答案计数。

1 = Large Business
2 = Small Business
3 = Non-profit 
    cross1 = pd.crosstab(sample["1"], sample["X"])
print(cross1)
X  1  2  3  4
1            
1  1  0  0  1
2  0  1  1  1
3  0  1  0  0

我需要迭代,所以我有多个数据框:

交叉1 交叉2 交叉3 交叉4 ...等等

demo_questions = 
['1', '2', '3']

survey_questions = 
['X', 'Y', 'Z']

for d, s in [demo_questions, survey_questions]:
    cross[d] = pd.crosstab(sample[d], sample[s])

我尝试了上述方法,但收到以下错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[37], line 1
----> 1 for d, s in [demo_questions, survey_questions]:
      2     cross[d] = pd.crosstab(sample[d], sample[s])

ValueError: too many values to unpack (expected 2)

python pandas loops pivot-table
1个回答
0
投票

创建一个字典来存储数据透视表,然后迭代演示问题和调查问题的组合,并在字典理解中生成频率表

cross = {
    f'{d}_{s}': 
        pd.crosstab(df[d], df[s])
    for d in demo_questions
    for s in survey_questions
}

现在您可以通过索引字典来访问结果

print(cross['1_X'])

X  1  2  3  4
1            
1  1  0  0  1
2  0  1  1  1
3  0  1  0  0
© www.soinside.com 2019 - 2024. All rights reserved.