我有两列,电话和电子邮件,需要分解成行。我已经弄清楚如何对其中之一执行此操作,但不能同时对两者执行此操作。最大的问题是我可能有0到很多电话和0到很多电子邮件。因此,如果客户有三封电子邮件但没有电话,那么我需要 3 行。如果他们有四部电话和三封电子邮件,那么我需要 4 行。每部电话一个,以及那四行中的三封电子邮件。 示例数据:
| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1 | A,B,C | A,B |
| row 2 | | D,E,F |
结果示例:
| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1 | A | A |
| row 1 | B | B |
| row 1 | C | |
| row 2 | | D |
| row 2 | | E |
| row 2 | | F |
# Convert cell contents into lists rather than strings
df0['phones'] = df0['phones'].str.split(";", expand=False)
df0['emails'] = df0['emails'].str.split(",", expand=False)
df0 = df0.apply(pd.Series.explode) # DOES NOT WORK
当我尝试上面的代码时,出现错误:
ValueError: cannot reindex on an axis with duplicate labels
我假设原始数据帧上的索引是唯一的。如果没有,请在以下代码段之前运行
df = df.reset_index()
:
columns = ["phones", "emails"]
# Explode each column individually, but instead of using `explode`, we will
# use`stack` to give us a second index level
exploded = [
df[col].str.split(",", expand=True).stack().rename(col)
for col in columns
]
# Align the exploded columns
exploded = pd.concat(exploded, axis=1).droplevel(-1)
# Merge it with the original data frame
result = pd.concat([df.drop(columns=columns), exploded], axis=1)
import itertools
import pandas as pd
import numpy as np
from pandas import DataFrame as df
df = pd.DataFrame({"x":[1,3,7],"y":["A","B","C"],
"z":["p1,p2,p3","p4","p5,p6"],"package_code":["111,222,333","444","555,666"]})
print(df)
"""
x y z package_code
0 1 A p1,p2,p3 111,222,333
1 3 B p4 444
2 7 C p5,p6 555,666
"""
aa = (
df.set_index(['x','y'])
.apply(lambda col : pd.Series(col).str.split(','))
.explode(['z','package_code'])
.reset_index()
.reindex(df.columns,axis=1)
)
print(aa)
"""
x y z package_code
0 1 A p1 111
1 1 A p2 222
2 1 A p3 333
3 3 B p4 444
4 7 C p5 555
5 7 C p6 666
"""