整理数据:使用模式将多行收集到列中。

问题描述 投票:0回答:1

我的数据框架不整洁。

id                            16 
pol_pup1.irf_pol1_pub1          0.0186380741
pol_pup1.lower_pol1_pub1        0.0092071786
pol_pup1.upper_pol1_pub1        0.0289460145
pol_pup10.irf_pol10_pub10       0.0061496499
pol_pup10.lower_pol10_pub10     0.0030948510
pol_pup10.upper_pol10_pub10     0.0080107893
pol_pup105.irf_pol105_pub105    0.0377057491
pol_pup105.lower_pol105_pub105  0.0157756274
pol_pup105.upper_pol105_pub105  0.0610782151
pol_pup111.irf_pol111_pub111    0.0169799646
pol_pup111.lower_pol111_pub111  0.0111885580
pol_pup111.upper_pol111_pub111  0.0217701354
pol_pup112.irf_pol112_pub112    0.0156278416
pol_pup112.lower_pol112_pub112  -0.0043273923
pol_pup112.upper_pol112_pub112  0.0342078865
pol_pup113.irf_pol113_pub113    0.0280868673
pol_pup113.lower_pol113_pub113  0.0203300863
pol_pup113.upper_pol113_pub113  0.0366594965
pol_pup114.irf_pol114_pub114    0.0086282368

and so on with different numbers

我怎么能做一个数据框,其中有一个单独的IRF,下位和上位列,并且在 "id "列中的每个数字都是一个观察值,就像这样:

Observation IRF      Lower   Upper 
1           0.018    0.009   0.028 
10          0.006    0.003   0.008
105         0.037    0.015   0.061
111         0.016    0.011   0.021
r dplyr tidyr tidy
1个回答
2
投票

这里有一个方法 separatetidyr:

一旦第一列与其他列分开,我们就可以使用正则表达式和 str_extractstringr. 该 "[a-z]+$" 模式匹配任意一个或多个小写字母,并跟在字符串的末尾。

然后我们可以使用 pivot_widertidyr.

library(tidyr)
library(dplyr)
library(stringr)
data %>% 
  separate(id,sep = "_", into = c("Pol","Value","Observation","Pub")) %>%
  mutate(Value = str_extract(Value,"[a-z]+$"),
         Observation = str_extract(Observation,"[0-9]+$")) %>%
  dplyr::select(-Pol,-Pub) %>%
  pivot_wider(names_from = Value, values_from = last_col())
# A tibble: 7 x 4
  Observation     irf    lower    upper
  <chr>         <dbl>    <dbl>    <dbl>
1 1           0.0186   0.00921  0.0289 
2 10          0.00615  0.00309  0.00801
3 105         0.0377   0.0158   0.0611 
4 111         0.0170   0.0112   0.0218 
5 112         0.0156  -0.00433  0.0342 
6 113         0.0281   0.0203   0.0367 
7 114         0.00863 NA       NA      

2
投票

我不知道你的数据框架有多一致,但有些变化可能会有用。我假设你把数字列命名为 "16"。

df %>% 
  mutate(
    obs = str_extract(id, '[0-9]+'),
    group = str_extract(id, 'irf|lower|upper')
  ) %>% 
  select(-id) %>% 
  pivot_wider(
    names_from = group,
    values_from = `16`
  )
© www.soinside.com 2019 - 2024. All rights reserved.