在 R 中编写一个函数来处理多个 if else 重新编码语句

问题描述 投票:0回答:2

我有一个庞大的物质使用数据集,测量过去一周的日常使用情况。我正在尝试编写一个可以轻松处理它的函数。我有两个步骤需要完成:

  1. 写一个重新编码函数。

当前数据被标记为“sub1”、“sub2”、“sub3”……一直到 6。并且有一个相应的“value”变量(例如“sub1_value”= Alcohol)。然后是 sub1_day1..sub1_day2...等等,用数字表示使用量。我想创建基于物质的新变量(例如 day1_alcohol、day2_alcohol、day1_cocaine 等...)

  1. 为 6 个不同的“sub1/2/3”变量和 15 种可能的物质编写一个函数。

我不想复制、粘贴和编辑这 15 次不同的时间。我正在尝试开发一个可以为我做这件事的功能。

我设置了一个 reprex,每人有 2 种物质,4 种可能的物质(c(“酒精”、“可卡因”、“鸦片制剂”、“大麻”)),使用 3 天。

#Sample Data:

df <-
  data.frame(
    sub1_value = c("Alcohol", "Alcohol", "Cocaine", "Opiates", "Cannabis"),
    sub1_day1 = c(4, 3, 1, 0, 1),
    sub1_day2 = c(4, 7, 1, 0, 0),
    sub1_day3 = c(5, 6, 0, 1, 1),
    sub2_value = c("Cannabis", "Opiates", "Alcohol", "Cocaine", "Alcohol"),
    sub2_day1 = c(7, 2, 0, 0, 0),
    sub2_day2 = c(3, 2, 1, 1, 1),
    sub2_day3 = c(9, 8, 0, 1, 1)
  )

这段代码让我处理“sub1”和“sub2”:-有没有更有效的方法来写这个?


    df <- df %>%
      mutate(
        day1_alc = if_else(
          sub1_value == "Alcohol",
          sub1_day1,
          if_else(sub2_value == "Alcohol", sub2_day1,
                  NA)
        ),
        day2_alc = if_else(
          sub1_value == "Alcohol",
          sub1_day2,
          if_else(sub2_value == "Alcohol", sub2_day2,
                  NA)
        ),
        day3_alc = if_else(
          sub1_value == "Alcohol",
          sub1_day3,
          if_else(sub2_value == "Alcohol", sub2_day3,
                  NA)
        )
      )

问题二 - 如何编写一个函数来为所有的日子和所有物质做这个。正如我提到的,我有很多时间和物质,所以希望尽可能减少工作量。

我期待一个保留原始数据文件但也有变量的数据集 day1_alc, day2_alc, day3_alc 第 1 天_可卡因、第 2 天_可卡因...等 将值从“sub1”复制到适当的新变量。

感谢任何关于 if 语句或函数的指导帮助!

编辑:我想出了一个解决方案——希望在功能部分得到帮助。

r dplyr data-cleaning
2个回答
1
投票

旋转到长格式可能看起来像这样:

library(dplyr)
library(tidyr) # pivot_*
df %>%
  mutate(rn = row_number()) %>%
  pivot_longer(-c(rn, sub1_value, sub2_value), names_pattern = "(.*)_(.*)", names_to = c(".value", "day"))
# # A tibble: 15 × 6
#    sub1_value sub2_value    rn day    sub1  sub2
#    <chr>      <chr>      <int> <chr> <dbl> <dbl>
#  1 Alcohol    Cannabis       1 day1      4     7
#  2 Alcohol    Cannabis       1 day2      4     3
#  3 Alcohol    Cannabis       1 day3      5     9
#  4 Alcohol    Opiates        2 day1      3     2
#  5 Alcohol    Opiates        2 day2      7     2
#  6 Alcohol    Opiates        2 day3      6     8
#  7 Cocaine    Alcohol        3 day1      1     0
#  8 Cocaine    Alcohol        3 day2      1     1
#  9 Cocaine    Alcohol        3 day3      0     0
# 10 Opiates    Cocaine        4 day1      0     0
# 11 Opiates    Cocaine        4 day2      0     1
# 12 Opiates    Cocaine        4 day3      1     1
# 13 Cannabis   Alcohol        5 day1      1     0
# 14 Cannabis   Alcohol        5 day2      0     1
# 15 Cannabis   Alcohol        5 day3      1     1

从这里,我们可以使用

case_when
来确定
alc
列:

df %>%
  mutate(rn = row_number()) %>%
  pivot_longer(-c(rn, sub1_value, sub2_value), names_pattern = "(.*)_(.*)", names_to = c(".value", "day")) %>%
  mutate(
    alc = case_when(
      sub1_value == "Alcohol" ~ sub1, 
      sub2_value == "Alcohol" ~ sub2, 
      .default = NA)
  )
# # A tibble: 15 × 7
#    sub1_value sub2_value    rn day    sub1  sub2   alc
#    <chr>      <chr>      <int> <chr> <dbl> <dbl> <dbl>
#  1 Alcohol    Cannabis       1 day1      4     7     4
#  2 Alcohol    Cannabis       1 day2      4     3     4
#  3 Alcohol    Cannabis       1 day3      5     9     5
#  4 Alcohol    Opiates        2 day1      3     2     3
#  5 Alcohol    Opiates        2 day2      7     2     7
#  6 Alcohol    Opiates        2 day3      6     8     6
#  7 Cocaine    Alcohol        3 day1      1     0     0
#  8 Cocaine    Alcohol        3 day2      1     1     1
#  9 Cocaine    Alcohol        3 day3      0     0     0
# 10 Opiates    Cocaine        4 day1      0     0    NA
# 11 Opiates    Cocaine        4 day2      0     1    NA
# 12 Opiates    Cocaine        4 day3      1     1    NA
# 13 Cannabis   Alcohol        5 day1      1     0     0
# 14 Cannabis   Alcohol        5 day2      0     1     1
# 15 Cannabis   Alcohol        5 day3      1     1     1

如果您需要以宽格式恢复它(我不推荐它,但以防万一):

df %>%
  mutate(rn = row_number()) %>%
  pivot_longer(-c(rn, sub1_value, sub2_value), names_pattern = "(.*)_(.*)", names_to = c(".value", "day")) %>%
  mutate(alc = case_when(sub1_value == "Alcohol" ~ sub1, sub2_value == "Alcohol" ~ sub2, .default = NA)) %>%
  pivot_wider(id_cols = c(rn, sub1_value, sub2_value), names_from = day, values_from = c(sub1, sub2, alc))
# # A tibble: 5 × 12
#      rn sub1_value sub2_value sub1_day1 sub1_day2 sub1_day3 sub2_day1 sub2_day2 sub2_day3 alc_day1 alc_day2 alc_day3
#   <int> <chr>      <chr>          <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>    <dbl>    <dbl>    <dbl>
# 1     1 Alcohol    Cannabis           4         4         5         7         3         9        4        4        5
# 2     2 Alcohol    Opiates            3         7         6         2         2         8        3        7        6
# 3     3 Cocaine    Alcohol            1         1         0         0         1         0        0        1        0
# 4     4 Opiates    Cocaine            0         0         1         0         1         1       NA       NA       NA
# 5     5 Cannabis   Alcohol            1         0         1         0         1         1        0        1        1

只有在需要将其恢复为原始宽格式时才需要使用

rn
。如果您有另一个未包含的字段,这是一个很好的
id
类字段,它可能更合适,但我不认为
rn
在这里出错。


0
投票

这是一个

tidyverse
方法:

library(dplyr)
library(tidyr)
library(stringr)

df1 <- bind_rows(df[, 1:4], df[, 5:8] %>% rename_with(~colnames(df[1:4]))) %>% 
  rename_with(~str_replace(., ".*\\_", "")) 
 
bind_cols(df1[1], df1 %>% 
            pivot_longer(-value,
               names_to = "name", 
               values_to = "day") %>% 
            mutate(id =as.integer(gl(n(),3,n()))) %>% 
            pivot_wider(names_from = c(name, value), 
              values_from = day,
              names_glue = "{name}_{value}")
          ) %>% 
  select(-id)
value day1_Alcohol day2_Alcohol day3_Alcohol day1_Cocaine day2_Cocaine day3_Cocaine day1_Opiates day2_Opiates day3_Opiates day1_Cannabis day2_Cannabis day3_Cannabis
1   Alcohol            4            4            5           NA           NA           NA           NA           NA           NA            NA            NA            NA
2   Alcohol            3            7            6           NA           NA           NA           NA           NA           NA            NA            NA            NA
3   Cocaine           NA           NA           NA            1            1            0           NA           NA           NA            NA            NA            NA
4   Opiates           NA           NA           NA           NA           NA           NA            0            0            1            NA            NA            NA
5  Cannabis           NA           NA           NA           NA           NA           NA           NA           NA           NA             1             0             1
6  Cannabis           NA           NA           NA           NA           NA           NA           NA           NA           NA             7             3             9
7   Opiates           NA           NA           NA           NA           NA           NA            2            2            8            NA            NA            NA
8   Alcohol            0            1            0           NA           NA           NA           NA           NA           NA            NA            NA            NA
9   Cocaine           NA           NA           NA            0            1            1           NA           NA           NA            NA            NA            NA
10  Alcohol            0            1            1           NA           NA           NA           NA           NA           NA            NA            NA            NA
© www.soinside.com 2019 - 2024. All rights reserved.