鉴于此数据框:
data = [['Tom', 'run', '2022-01-26', 'run', '2027-01-26'], ['Max', 'stop', '2020-11-16', 'run', '2022-04-26'], ['Bob', 'run', '2021-10-03', 'stop', '2022-01-26'], ['Ben', 'run', '2020-03-11', 'stop', '2013-01-26'], ['Eva', 'stop', '2017-11-16', 'run', '2015-01-26']]
df = pd.DataFrame(data, columns=['person', 'action_1', 'time_1', 'action_2', 'time_2'])
person action_1 time_1 action_2 time_2
0 Tom run 2022-01-26 run 2027-01-26
1 Max stop 2020-11-16 run 2022-04-26
2 Bob run 2021-10-03 stop 2022-01-26
3 Ben run 2020-03-11 stop 2013-01-26
4 Eva stop 2017-11-16 run 2015-01-26
我希望它看起来像:
person action time
0 Tom run 2022-01-26
1 Max stop 2020-11-16
2 Bob run 2021-10-03
3 Ben run 2020-03-11
4 Eva stop 2017-11-16
5 Tom run 2027-01-26
6 Max run 2022-04-26
7 Bob stop 2022-01-26
8 Ben stop 2013-01-26
9 Eva run 2015-01-26
pd.wide_to_long
来完成:
df = pd.wide_to_long(df,
stubnames=['action', 'time'],
i='person',
j='num',
sep='_').reset_index()
输出:
person num action time
0 Tom 1 run 2022-01-26
1 Max 1 stop 2020-11-16
2 Bob 1 run 2021-10-03
3 Ben 1 run 2020-03-11
4 Eva 1 stop 2017-11-16
5 Tom 2 run 2027-01-26
6 Max 2 run 2022-04-26
7 Bob 2 stop 2022-01-26
8 Ben 2 stop 2013-01-26
9 Eva 2 run 2015-01-26
来自 pyjanitor 的 pivot_longer 的一个选项 - 它可以处理非唯一索引:
# pip install pyjanitor
import janitor
(df
.pivot_longer(
index='person',
names_to = ['action', 'time'],
names_pattern = ['action', 'time'])
)
person action time
0 Tom run 2022-01-26
1 Max stop 2020-11-16
2 Bob run 2021-10-03
3 Ben run 2020-03-11
4 Eva stop 2017-11-16
5 Tom run 2027-01-26
6 Max run 2022-04-26
7 Bob stop 2022-01-26
8 Ben stop 2013-01-26
9 Eva run 2015-01-26
另一种选择,使用
.value
:
(df
.pivot_longer(
index='person',
names_to = '.value',
names_pattern = r"(.+)_.+")
)
person action time
0 Tom run 2022-01-26
1 Max stop 2020-11-16
2 Bob run 2021-10-03
3 Ben run 2020-03-11
4 Eva stop 2017-11-16
5 Tom run 2027-01-26
6 Max run 2022-04-26
7 Bob stop 2022-01-26
8 Ben stop 2013-01-26
9 Eva run 2015-01-26