使用 Polars.pivot() 旋转数据框(如 R 中的ivot_longer)

问题描述 投票:0回答:1

来自 R 的我正在重新做一些对我帮助很大的练习。所以尝试重新创建这个 R 代码:

wide_data <- read_csv('https://raw.githubusercontent.com/rafalab/dslabs/master/inst/extdata/life-expectancy-and-fertility-two-countries-example.csv')

new_tidy_data <- pivot_longer(wide_data, `1960`:`2015`, names_to = "year", values_to = "fertility")

数据看起来像这样(我不知道如何粘贴输出) 但有 113 列:首先是国家/地区,然后是 1960_fertility 1960_life_expectancy 1961_fertility 1961_life_expectancy ..... 2015_fertility 2015_life_expectancy

还有2排德国、韩国

预期结果:

head(new_tidy_data)
#> # A tibble: 6 × 3
#>   country year  fertility
#>   <chr>   <chr>     <dbl>
#> 1 Germany 1960       2.41
#> 2 Germany 1961       2.44
#> 3 Germany 1962       2.47
#> 4 Germany 1963       2.49
#> 5 Germany 1964       2.49
#> # ℹ 1 more row

到目前为止,我的代码如下所示:

import polars as pl
import polars.selectors as cs

df = pl.read_csv('https://raw.githubusercontent.com/rafalab/dslabs/master/inst/extdata/life-expectancy-and-fertility-two-countries-example.csv')
df.pivot() # This is where not even chat gpt helped me

谢谢!!

python pivot python-polars
1个回答
0
投票

仅使用 pandas 库,我想出了以下解决方案:

import pandas as pd

wide_data = pd.read_csv('https://raw.githubusercontent.com/rafalab/dslabs/master/inst/extdata/life-expectancy-and-fertility-two-countries-example.csv')

new_tidy_data = wide_data.melt(id_vars='country', var_name='year', value_name='fertility')

# Check if the underscore is present before splitting

new_tidy_data['year'], new_tidy_data['metric'] = zip(*new_tidy_data['year'].apply(lambda x: x.split('_') if '_' in x else (x, None)))

# Filter only rows related to fertility
new_tidy_data = new_tidy_data[new_tidy_data['metric'] == 'fertility']

# Print the data
print(new_tidy_data)
© www.soinside.com 2019 - 2024. All rights reserved.