将 Pandas DataFrame 从宽格式转换为长格式

问题描述 投票:0回答:2

鉴于此数据框:

data = [['Tom', 'run', '2022-01-26', 'run', '2027-01-26'], ['Max', 'stop', '2020-11-16', 'run', '2022-04-26'], ['Bob', 'run', '2021-10-03', 'stop', '2022-01-26'], ['Ben', 'run', '2020-03-11', 'stop', '2013-01-26'], ['Eva', 'stop', '2017-11-16', 'run', '2015-01-26']]
df = pd.DataFrame(data, columns=['person', 'action_1', 'time_1', 'action_2', 'time_2'])
  person action_1      time_1 action_2      time_2
0    Tom      run  2022-01-26      run  2027-01-26
1    Max     stop  2020-11-16      run  2022-04-26
2    Bob      run  2021-10-03     stop  2022-01-26
3    Ben      run  2020-03-11     stop  2013-01-26
4    Eva     stop  2017-11-16      run  2015-01-26

我希望它看起来像:

  person action        time
0    Tom    run  2022-01-26
1    Max   stop  2020-11-16
2    Bob    run  2021-10-03
3    Ben    run  2020-03-11
4    Eva   stop  2017-11-16
5    Tom    run  2027-01-26
6    Max    run  2022-04-26
7    Bob   stop  2022-01-26
8    Ben   stop  2013-01-26
9    Eva    run  2015-01-26
python pandas
2个回答
3
投票

这可以使用

pd.wide_to_long
来完成:

df = pd.wide_to_long(df, 
                     stubnames=['action', 'time'],
                     i='person',
                     j='num',
                     sep='_').reset_index()

输出:

  person  num action        time
0    Tom    1    run  2022-01-26
1    Max    1   stop  2020-11-16
2    Bob    1    run  2021-10-03
3    Ben    1    run  2020-03-11
4    Eva    1   stop  2017-11-16
5    Tom    2    run  2027-01-26
6    Max    2    run  2022-04-26
7    Bob    2   stop  2022-01-26
8    Ben    2   stop  2013-01-26
9    Eva    2    run  2015-01-26

0
投票

来自 pyjanitorpivot_longer 的一个选项 - 它可以处理非唯一索引:

# pip install pyjanitor
import janitor

(df
.pivot_longer(
    index='person', 
    names_to = ['action', 'time'], 
    names_pattern = ['action', 'time'])
)
  person action        time
0    Tom    run  2022-01-26
1    Max   stop  2020-11-16
2    Bob    run  2021-10-03
3    Ben    run  2020-03-11
4    Eva   stop  2017-11-16
5    Tom    run  2027-01-26
6    Max    run  2022-04-26
7    Bob   stop  2022-01-26
8    Ben   stop  2013-01-26
9    Eva    run  2015-01-26

另一种选择,使用

.value
:

(df
.pivot_longer(
    index='person', 
    names_to = '.value', 
    names_pattern = r"(.+)_.+")
) 
  person action        time
0    Tom    run  2022-01-26
1    Max   stop  2020-11-16
2    Bob    run  2021-10-03
3    Ben    run  2020-03-11
4    Eva   stop  2017-11-16
5    Tom    run  2027-01-26
6    Max    run  2022-04-26
7    Bob   stop  2022-01-26
8    Ben   stop  2013-01-26
9    Eva    run  2015-01-26
© www.soinside.com 2019 - 2024. All rights reserved.