我有一个数据框:
data = [
[1, "2022-04-29", 123, "circle", 1, 3, 6, 7.3],
[1, "2022-02-10", 456, "square", 4, np.nan, 3, 9],
]
df = pd.DataFrame(
data,
columns=[
"ID",
"date",
"code",
"shape",
"circle_X_rating",
"circle_Y_rating",
"square_X_rating",
"square_Y_rating",
],
)
df
ID date code shape circle_X_rating circle_Y_rating square_X_rating square_Y_rating
1 2022-04-29 123 circle 1 3.0 6 7.3
1 2022-02-10 456 square 4 NaN 3 9.0
我想融化这个数据框,以便有一个形状列和 2 列用于评级
X_rating
和 Y_rating
,我不知道该怎么做。目前我正在融化它,这就是我得到的:
test = (
pd.melt(
df,
id_vars=[
"ID",
"date",
"bar_code",
"shape",
],
value_vars=[
"circle_X_rating",
"circle_Y_rating",
"square_X_rating",
"square_Y_rating",
],
var_name="shape_for_rating",
value_name="shape_rating",
)
.assign(
shape_for_rating=lambda df: df["shape"].apply(lambda a_str: a_str.split("_")[0])
)
.query("shape == shape")
.drop(columns=["shape_for_rating"])
)
test
ID date code shape shape_rating
0 1 2022-04-29 123 circle 1.0
1 1 2022-02-10 456 square 4.0
2 1 2022-04-29 123 circle 3.0
3 1 2022-02-10 456 square NaN
4 1 2022-04-29 123 circle 6.0
5 1 2022-02-10 456 square 3.0
6 1 2022-04-29 123 circle 7.3
7 1 2022-02-10 456 square 9.0
但我真正想要的是:
ID date code shape X_rating Y_rating
0 1 2022-04-29 123 circle 1.0 3
1 1 2022-04-29 123 square 6.0 7.3
2 1 2022-02-10 456 circle 4 NaN
3 1 2022-02-10 456 square 3 9
...
有谁知道最好的方法吗?我一直在旋转我的轮子。
试试
wide_to_long
df.columns = df.columns.str.split('_',n=1).map(lambda x : '_'.join(x[::-1]))
df = pd.wide_to_long(df,
stubnames = ['X_rating','Y_rating'],
i = ['ID', 'date', 'code', 'shape'],
j = 'shape1',
suffix = r'\w+').reset_index()
df
Out[84]:
ID date code shape shape1 X_rating Y_rating
0 1 2022-04-29 123 circle _circle 1 3.0
1 1 2022-04-29 123 circle _square 6 7.3
2 1 2022-02-10 456 square _circle 4 NaN
3 1 2022-02-10 456 square _square 3 9.0
janitor
的pivot_longer
很容易实现:
# pip install janitor
import janitor
out = df.pivot_longer(index=['ID', 'date', 'code', 'shape'],
names_to=('shape', '.value'),
names_pattern=(r'([^_]*)_(.*)'),
sort_by_appearance=True
)
输出:
ID date code shape X_rating Y_rating
0 1 2022-04-29 123 circle 1 3.0
1 1 2022-04-29 123 square 6 7.3
2 1 2022-02-10 456 circle 4 NaN
3 1 2022-02-10 456 square 3 9.0