根据R中不同状态的条件创建列状态

问题描述 投票:1回答:3

我有一个这样的数据框:

ID <- c(1,2,3,4,5,5,5,6,6)
States <- c(NA,NA,"All Locked","All Not Locked","All Locked","All Locked"
                   ,"All Not Locked","All Not Locked","All Not Locked")
ToolID <- c(NA,NA,"SWP","SWP","SWP","SWP","SWP","SWP","SWP")
Measurement <- c("Length","Breadth","Width","Height","Time","Time"
                   ,"Time","Mass","Mass")
Location <- c("US","US","UK","UK","US","US","US","UK","UK")

df1 <- data.frame(ID,States,ToolID,Measurement,Location)

我正在尝试使用以下条件对此数据框执行一些数据操作

For each ID (grouped),     
    if States = NA, then the Status = "No Status"
    if States column contains at least(count >=) 1 "All Locked", then the Status = "Lock Status"
    if States column doesn't contain (count =0)  "All Locked", then the Status = "No Lock Status"

我想要的输出是

  ID ToolID Measurement Location         Status
   1     NA      Length       US      No Status
   2     NA     Breadth       US      No status
   3    SWP       Width       UK    Lock Status
   4    SWP      Height       UK No Lock Status
   5    SWP        Time       US    Lock Status
   6    SWP        Mass       UK No Lock Status

我试图这样做,但逻辑错误

df1$Status <- ifelse(df1$States == NA, "No Status",
                ifelse((count(df1$States == "All Locked") >=1),
                  "Lock Status",
                  ifelse((count(df1$States == "All Locked") <1),
                    "No Lock Status", NA)))

有人能指出我正确的方向吗?我想申请我更大的数据集,因此快速解决方案对我有很大帮助。

r dataframe dplyr data.table tidyverse
3个回答
1
投票

对于NA元素,使用is.nadplyr::countdata.frame/tbls上工作。

在这里,我们按'ID'分组,检查if在'States'列中至少有一个"All Locked"然后将其更改为整个组的“All Locked”(而不是使用mutate执行此操作,在group_byadd=TRUE中更改它为了添加一个新的分组变量和现有的组),按“ID”和“状态”的频率获取组,然后根据条件,更改“状态”中的值

library(dplyr)
df1 %>% 
  group_by(ID) %>%
  group_by(States = if("All Locked" %in% States) "All Locked" 
              else States, add = TRUE) %>% 
  mutate(n = n()) %>%
  ungroup %>% 
  mutate(States = c("No Lock Status", "Lock Status")[1+ 
                (States == "All Locked" & n >=1)], 
          States = replace(States, is.na(States), "No Status")) %>%
  select(-n) %>% 
  distinct

1
投票

这是一个使用dplyr::case_when的简短干净的习语。首先,我们计算Status作为“全部锁定”(0..1或NA)的状态的汇总统计比例,然后我们立即将Status列回收到相应的字符串输出中:

df1 %>% group_by(ID) %>%

    summarize(ToolID=ToolID[1], Measurement=Measurement[1], Location=Location[1],
      Status = sum( States=="All Locked")/n() ) %>%

    mutate(Status = case_when(
      is.na(Status)         ~ "No Status",
      Status == 1           ~ "Lock Status",
      Status == 0           ~ "No Lock Status",
      between(Status, 0, 1) ~ as.character(NA) ))

输出:

     ID ToolID Measurement Location Status        
  <dbl> <fctr> <fctr>      <fctr>   <chr>         
1  1.00 NA     Length      US       No Status     
2  2.00 NA     Breadth     US       No Status     
3  3.00 SWP    Width       UK       Lock Status   
4  4.00 SWP    Height      UK       No Lock Status
5  5.00 SWP    Time        US       NA            
6  6.00 SWP    Mass        UK       No Lock Status

1
投票

any()函数非常适合聚合,这里。加入查找表会将NATRUEFALSE转换为OP期望的Status值。

该方法可以用data.table语法以及dplyr样式实现。

Create lookup table

这将由data.tabledplyr变体使用。

library(data.table)
lut <- data.table(st = c(NA, TRUE, FALSE), 
                  Status = c("No Status", "Lock Status", "No Lock Status"))

data.table version

library(data.table)
# aggregate by ID
agg <- setDT(df1)[, .(st = any(States == "All Locked")), by = ID][
  #  join with lookup table
  lut, on = "st"][, -"st"]
# join with df1 to prepend other columns
unique(df1[, -"States"])[agg, on = "ID"]
   ID ToolID Measurement Location         Status
1:  1   <NA>      Length       US      No Status
2:  2   <NA>     Breadth       US      No Status
3:  3    SWP       Width       UK    Lock Status
4:  5    SWP        Time       US    Lock Status
5:  4    SWP      Height       UK No Lock Status
6:  6    SWP        Mass       UK No Lock Status

dplyr version

library(dplyr)
agg <-df1 %>% 
  group_by(ID) %>% 
  summarize(st = any(States == "All Locked")) %>% 
  left_join(lut) %>% 
  select(-st)
df1 %>% 
  select(-States) %>%  
  unique() %>% 
  left_join(agg)
  ID ToolID Measurement Location         Status
1  1   <NA>      Length       US      No Status
2  2   <NA>     Breadth       US      No Status
3  3    SWP       Width       UK    Lock Status
4  4    SWP      Height       UK No Lock Status
5  5    SWP        Time       US    Lock Status
6  6    SWP        Mass       UK No Lock Status
© www.soinside.com 2019 - 2024. All rights reserved.