将数据框中存在的嵌套、不等和多层 JSON 列转换为 R 中同一数据框中的新列和行

问题描述 投票:0回答:1

我需要处理这个数据框,其中包含数据框内两列嵌套且不等排的层,我需要广泛扩展这些列中可能的内容。

 jsonblob1 = c("{\"basestep_id\":\"BSa7pt5o7xx8dxaderflht3o\",\"step_act\":\"SET_NEW\",\"step_count\":0,\"custm\":false,\"edited\":false,\"name\":null,\"text\":\"Lorem ipsum dolor sit.\",\"crits\":[\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019.\",\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019 Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit..\\u2019\",\"Lorem ipsum dolor sit.\"],\"tasks\":[]}","{\"basestep_id\":\"BSa7pt5o7xx8dxaderflht3o\",\"step_act\":\"SET_NEW\",\"step_count\":0,\"custm\":false,\"edited\":false,\"name\":null,\"text\":\"Lorem ipsum dolor sit.\",\"crits\":[\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019.\",\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019 Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit..\\u2019\",\"Lorem ipsum dolor sit.\"],\"tasks\":[]}")
 jsonblob2 = c("{\"basestep_id\":\"BSa7pt5o7xx8dxaderflht3o\",\"step_act\":\"SET_NEW\",\"step_count\":0,\"custm\":false,\"edited\":false,\"name\":null,\"text\":\"Lorem ipsum dolor sit.\",\"crits\":[\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019.\",\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019 Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit..\\u2019\",\"Lorem ipsum dolor sit.\"],\"tasks\":[]}","{\"basestep_id\":\"BSa7pt5o7xx8dxaderflht3o\",\"step_act\":\"SET_NEW\",\"step_count\":0,\"custm\":false,\"edited\":false,\"name\":null,\"text\":\"Lorem ipsum dolor sit.\",\"crits\":[\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019.\",\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019 Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit..\\u2019\",\"Lorem ipsum dolor sit.\"],\"tasks\":[\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019.\",\"Lorem ipsum dolor sit.\",\"Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit.\\u2019 Lorem ipsum dolor sit. \\u2018Lorem ipsum dolor sit..\\u2019\",\"Lorem ipsum dolor sit.\"]}")
 otherc1 = c(1,2)
 otherc2 = c(2,5)

 df = data.frame(jsonblob1,jsonblob2,otherc1,otherc2)

到目前为止,我已经尝试过这段代码,我之前曾用它来扩展其他多层 JSON

 df |> 
   mutate(
     jsonblob1 = map(jsonblob1, ~ jsonlite::fromJSON(.x) |> 
                       unnest_wider(everything()) |> 
                       unnest_wider(everything(), names_sep = "_") |> 
                       mutate(across(everything(), as.character)))) |>
   mutate(
     jsonblob2 = map(jsonblob2, ~ jsonlite::fromJSON(.x) |> 
                       unnest_wider(everything()) |> 
                       unnest_wider(everything(), names_sep = "_") |> 
                       mutate(across(everything(), as.character))))
       unnest(jsonblob1,jsonblob2) |>  
       type_convert() 

但它返回此错误,我无法修复它。

Error in `mutate()`:
ℹ In argument: `jsonblob1 = map(...)`.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `unnest_wider()`:
! `data` must be a data frame, not a list.
Run `rlang::last_trace()` to see where the error occurred.

为了确保目标明确,我想尽可能将包含单行的所有图层扩展到同一行,以及那些具有多行的元素(例如示例中的

crits
)我想知道如何将它们扩展得更宽或更长(重复单个观察以匹配不等元素中的行数)。 另外,
jsonblob
有意包含一个长度为(0)的元素,但有时它也有关于它的信息,如
jsonblob2
中的第二行,因此解决方案应该灵活。

我很感谢您的帮助。

r json
1个回答
0
投票

也许首先考虑转向更长的时间。

library(purrr)
library(dplyr)
library(tidyr)

as_tibble(df)
#> # A tibble: 2 × 4
#>   jsonblob1                                            jsonblob2 otherc1 otherc2
#>   <chr>                                                <chr>       <dbl>   <dbl>
#> 1 "{\"basestep_id\":\"BSa7pt5o7xx8dxaderflht3o\",\"st… "{\"base…       1       2
#> 2 "{\"basestep_id\":\"BSa7pt5o7xx8dxaderflht3o\",\"st… "{\"base…       2       5

flat <- df |>
  pivot_longer(starts_with("json"), names_to = "blob_id", values_to = "json") |>
  mutate(json = map(json, jsonlite::parse_json)) |>
  unnest_wider(json) |>
  unnest_longer(crits) |>
  unnest_longer(tasks, keep_empty = TRUE)

flat
#> # A tibble: 40 × 12
#>    otherc1 otherc2 blob_id   basestep_id  step_act step_count custm edited name 
#>      <dbl>   <dbl> <chr>     <chr>        <chr>         <int> <lgl> <lgl>  <lgl>
#>  1       1       2 jsonblob1 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#>  2       1       2 jsonblob1 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#>  3       1       2 jsonblob1 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#>  4       1       2 jsonblob1 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#>  5       1       2 jsonblob1 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#>  6       1       2 jsonblob2 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#>  7       1       2 jsonblob2 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#>  8       1       2 jsonblob2 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#>  9       1       2 jsonblob2 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#> 10       1       2 jsonblob2 BSa7pt5o7xx… SET_NEW           0 FALSE FALSE  NA   
#> # ℹ 30 more rows
#> # ℹ 3 more variables: text <chr>, crits <chr>, tasks <chr>
glimpse(flat)
#> Rows: 40
#> Columns: 12
#> $ otherc1     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
#> $ otherc2     <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
#> $ blob_id     <chr> "jsonblob1", "jsonblob1", "jsonblob1", "jsonblob1", "jsonb…
#> $ basestep_id <chr> "BSa7pt5o7xx8dxaderflht3o", "BSa7pt5o7xx8dxaderflht3o", "B…
#> $ step_act    <chr> "SET_NEW", "SET_NEW", "SET_NEW", "SET_NEW", "SET_NEW", "SE…
#> $ step_count  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ custm       <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
#> $ edited      <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
#> $ name        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ text        <chr> "Lorem ipsum dolor sit.", "Lorem ipsum dolor sit.", "Lorem…
#> $ crits       <chr> "Lorem ipsum dolor sit.", "Lorem ipsum dolor sit. ‘Lorem i…
#> $ tasks       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

创建于 2024-05-01,使用 reprex v2.1.0

© www.soinside.com 2019 - 2024. All rights reserved.