使用灵活的调用(从循环中使用,将数据的宽度从宽变长)(在循环中使用)

问题描述 投票:2回答:1

我需要 ---或可能的 ---一些宽的时间序列数据,宽度随宽度而变化,以便能够将其循环导入。

[数据点是每季度一次,但是我以年份为单位(四个季度)和六个月为一个季度(仅两个季度)接收数据,即数据是关于宽度的变化

我想找到一个简单而灵活的解决方案,该解决方案可以循环使用,导入许多年零六个月的数据块(而且,由于我需要说服我的研究小组使用R,因此我在这里要求提供简单的/ (最好使用)的智能/清洁溶液。

在年份块中,数据看起来像这样,

dta_wide1 <- structure(list(V1 = c("", "", "", "", "", "", "", "peach", "dragonfruit", "honeydew", "huckleberry", "", ""), V2 = c("ABC", "some info", "Store A", "", "As of 31/03/2019", "label1", "", "7", "5", "6", "1", "(a) some useless clutter", "(b) more not relevent information"), V3 = c("", "", "", "", "", "", "label2", "0.5", "0.4", "0.8", "0.3", "", ""), V4 = c("", "", "", "", "", "label4", "label4a", "21", "21", "87", "21", "", ""), V5 = c("", "", "", "", "", "", "label4b", "0.3", "0.1", "0.4", "0.2", "", ""), V6 = c("", "", "", "", "As of 30/06/2019", "label1", "", "5", "2", "3", "7", "", ""), V7 = c("", "", "", "", "", "", "label2", "0.46", "0.72", "0.7", "0.8", "", ""), V8 = c("", "", "", "", "", "label4", "label4a", "19", "22", "85", "25", "", ""), V9 = c("", "", "", "", "", "", "label4b", "0.4", "0.1", "0.3", "0.2", "", ""), V10 = c("", "", "", "", "As of 30/09/2019", "label1", "", "4", "1", "4", "8", "", ""), V11 = c("", "", "", "", "", "", "label2", "0.1", "0.3", "0.6", "0.22", "", ""), V12 = c("", "", "", "", "", "label4", "label4a", "21", "23", "71", "27", "", ""), V13 = c("", "", "", "", "", "", "label4b", "0.3", "0.1", "0.4", "0.2", "", ""), V14 = c("", "", "", "", "As of 31/12/2019", "label1", "", "8", "6", "9", "9", "", ""), V15 = c("", "", "", "", "", "", "label2", "0.7", "0.87", "0.55", "0.33", "", ""), V16 = c("", "", "", "", "", "label4", "label4a", "24", "25", "99", "35", "", ""), V17 = c("", "", "", "", "", "", "label4b", "0.3", "0.1", "0.4", "0.2", "", "")), class = "data.frame", row.names = c(NA, -13L))

并且在六个月内都是这样,

dta_wide2 <- structure(list(V1 = c("", "", "", "", "", "", "", "peach", "dragonfruit", "honeydew", "huckleberry", "", ""), V2 = c("ABC", "some info", "Store A", "", "As of 31/03/2020", "label1", "", "2", "3", "4", "8", "(a) some useless clutter", "(b) more not relevent information"), V3 = c("", "", "", "", "", "", "label2", "0.1", "0.2", "0.3", "0.8", "", ""), V4 = c("", "", "", "", "", "label4", "label4a", "10", "11", "12", "9", "", ""), V5 = c("", "", "", "", "", "", "label4b", "0.3", "0.1", "0.4", "0.2", "", ""), V6 = c("", "", "", "", "As of 30/06/2020", "label1", "", "4", "6", "8", "16", "", ""), V7 = c("", "", "", "", "", "", "label2", "0.22", "0.33", "0.44", "0.55", "", ""), V8 = c("", "", "", "", "", "label4", "label4a", "11", "12", "13", "10", "", ""), V9 = c("", "", "", "", "", "", "label4b", "0.4", "0.1", "0.3", "0.2", "", "")), class = "data.frame", row.names = c(NA, -13L))

即(六个月内)

# install.packages(c("tidyverse"), dependencies = TRUE)
library(tidyverse)
dta_wide2 %>% as_tibble
# A tibble: 13 x 9
V1       V2             V3     V4      V5     V6       V7    V8     V9    
<chr>    <chr>          <chr>  <chr>   <chr>  <chr>    <chr> <chr>  <chr> 
   1 ""       "ABC"          ""     ""      ""     ""       ""    ""     ""    
2 ""       "some info"    ""     ""      ""     ""       ""    ""     ""    
3 ""       "Store A"      ""     ""      ""     ""       ""    ""     ""    
4 ""       ""             ""     ""      ""     ""       ""    ""     ""    
5 ""       "As of 31/03/~ ""     ""      ""     "As of ~ ""    ""     ""    
6 ""       "label1"       ""     "label~ ""     "label1" ""    "labe~ ""    
7 ""       ""             "labe~ "label~ "labe~ ""       "lab~ "labe~ "labe~
8 "peach"  "2"            "0.1"  "10"    "0.3"  "4"      "0.2~ "11"   "0.4" 
9 "dragon~ "3"            "0.2"  "11"    "0.1"  "6"      "0.3~ "12"   "0.1" 
10 "honeyd~ "4"            "0.3"  "12"    "0.4"  "8"      "0.4~ "13"   "0.3" 
11 "huckle~ "8"            "0.8"  "9"     "0.2"  "16"     "0.5~ "10"   "0.2" 
12 ""       "(a) some use~ ""     ""      ""     ""       ""    ""     ""    
13 ""       "(b) more not~ ""     ""      ""     ""       ""    ""     ""    

dta_wide2中,日期键像这样浮动

> dta_wide2[5,] %>% str_sub(start= -10) %>% lubridate::dmy()
[1] NA           "2020-03-31" NA           NA           NA          
[6] "2020-06-30" NA           NA           NA      

所以我尝试这样整理一下

dta_wide2 %>% 
   add_column(date1 = dta_wide2[5,2] %>% str_sub(start= -10) %>% lubridate::dmy(), .before = 2)  %>% 
   add_column(date2 = dta_wide2[5,6] %>% str_sub(start= -10) %>% lubridate::dmy(), .before = 6) %>% 
   add_column(store = dta_wide2[3,2], .before = 2) %>% as_tibble

# A tibble: 13 x 12
   V1    store date1      V2    V3    V4    date2      V5    V6    V7   
   <chr> <chr> <date>     <chr> <chr> <chr> <date>     <chr> <chr> <chr>
 1 ""    Stor~ 2020-03-31 "ABC" ""    ""    2020-06-30 ""    ""    ""   
 2 ""    Stor~ 2020-03-31 "som~ ""    ""    2020-06-30 ""    ""    ""   
 3 ""    Stor~ 2020-03-31 "Sto~ ""    ""    2020-06-30 ""    ""    ""   
 4 ""    Stor~ 2020-03-31 ""    ""    ""    2020-06-30 ""    ""    ""   
 5 ""    Stor~ 2020-03-31 "As ~ ""    ""    2020-06-30 ""    "As ~ ""   
 6 ""    Stor~ 2020-03-31 "lab~ ""    "lab~ 2020-06-30 ""    "lab~ ""   
 7 ""    Stor~ 2020-03-31 ""    "lab~ "lab~ 2020-06-30 "lab~ ""    "lab~
 8 "pea~ Stor~ 2020-03-31 "2"   "0.1" "10"  2020-06-30 "0.3" "4"   "0.2~
 9 "dra~ Stor~ 2020-03-31 "3"   "0.2" "11"  2020-06-30 "0.1" "6"   "0.3~
10 "hon~ Stor~ 2020-03-31 "4"   "0.3" "12"  2020-06-30 "0.4" "8"   "0.4~
11 "huc~ Stor~ 2020-03-31 "8"   "0.8" "9"   2020-06-30 "0.2" "16"  "0.5~
12 ""    Stor~ 2020-03-31 "(a)~ ""    ""    2020-06-30 ""    ""    ""   
13 ""    Stor~ 2020-03-31 "(b)~ ""    ""    2020-06-30 ""    ""    ""   
# ... with 2 more variables: V8 <chr>, V9 <chr>

现在,我需要使用pivot_longer进行更长时间的[[pivot它]],但是我的挑战是如何-当我还获得看起来像dta_wide1的数据(即四分之四)时- -是否以一种可以同时用于dta_wide1dta_wide2的灵活方式进行操作。我已经为此工作了一段时间,任何帮助使其工作,简单地清理或清理它的工作都将得到极大的重视。

[我现在在这里,但不正确,不灵活且不简单

dta_wide2_foo <- dta_wide2 names(dta_wide2_foo) <- c('goods', paste0(dta_wide2[6,2:5], dta_wide2[7,2:5], sep = '_1'), paste0(dta_wide2[6,2:5], dta_wide2[7,2:5], sep = '_2')) dta_wide2_foo %>% add_column(date1 = dta_wide2[5,2] %>% str_sub(start= -10) %>% lubridate::dmy(), .before = 2) %>% add_column(date2 = dta_wide2[5,6] %>% str_sub(start= -10) %>% lubridate::dmy(), .before = 6) %>% add_column(store = dta_wide2[3,2], .before = 2) %>% as_tibble %>% .[8:11,] %>% pivot_longer(-c(goods, store, date1, date2), values_to = "Value", names_to = "variable") %>% print(n = 100)

或者是一些通用的代码段,这些代码段既不简单,也不聪明,但可以用来获取循环中两个示例数据中日期的位置

dta <- dta_wide2 dta[5,] %>% str_sub(start= -10) %>% lubridate::dmy() %>% { which(!is.na(.)) } [1] 2 6

或更干净的],>

dta <- dta_wide1 dta[5,] %>% grep("As ",.) [1] 2 6 10 14

更新2020-06-08 07:45:18Z

我的目标是合并较长的数据集以绘制数据,(Wimpel suggest below我使用lapply合并了各种不同的宽数据集,即dta_wide1dta_wide2...dta_widen()调用)我想象数据看起来像这样,

> dta_long # A tibble: 96 x 5 product label value date store <chr> <chr> <dbl> <date> <chr> 1 peach label1 7 2019-03-31 Store A 2 peach label2 0.5 2019-03-31 Store A 3 peach label4a 21 2019-03-31 Store A 4 peach label4b 0.3 2019-03-31 Store A 5 peach label1 5 2019-06-30 Store A 6 peach label2 0.46 2019-06-30 Store A 7 peach label4a 19 2019-06-30 Store A 8 peach label4b 0.4 2019-06-30 Store A 9 peach label1 4 2019-09-30 Store A 10 peach label2 0.1 2019-09-30 Store A # ... with 86 more rows

然后是 /用类似这样的方式绘制日期,

dta_long %>% filter(label == 'label1') %>% ggplot(aes(date, value, colour = product)) + geom_line() + scale_x_date(date_breaks = "3 months", date_labels = "%b-%y", limits = c((min(dta_long$date)-34), max = max(dta_long$date)))

enter image description here    

我需要旋转---或可能收集---一些宽的时间序列数据,宽度随宽度而变化,以便能够将其循环导入。数据点是每季度一次,但我接收到数据...

r time-series tidyverse reshape tidyr
1个回答
0
投票
我保存了两个样本数据集,并将它们存储在单独的.xlsb文件中。数据看起来像这样:

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.