根据另一个变量的老化生成新变量

问题描述 投票:1回答:2

我有一个像下面这样的数据集

ID. Invoice. Date of Invoice.  paid or not.  
1    1         09/30/2019       no
1    2         10/30/2019       no
1    3         11/30/2019       yes

2    1         10/31/2019       yes
2    1         10/31/2019       yes
2    2         11/30/2019       no
2    3         12/31/2019       no

3    1         7/31/2019        no
3    2         9/30/2019        yes
3    3         12/31/2019       no

我想知道客户是否愿意付款。只要客户支付了新发票而未支付的旧发票,我就会给他一个很好的分数。因此对于客户1和客户3,我给的评价是“好”,客户2的评价是“差”。

因此,最终数据将再增加一列,其值为好和坏。

r dplyr gdata
2个回答
2
投票

不清楚逻辑。可能是,我们可以按“ ID”分组后在第一行以外的任何行中检查“是”]

library(dplyr)
library(lubridate)
df1 %>% 
   mutate(Date_of_Invoice = mdy(Date_of_Invoice)) %>% 
   arrange(ID, Date_of_Invoice) %>%
   group_by(ID) %>%
   mutate(flag = c('bad', 'good')[1 + any(paid_or_not[-1] == "yes")])
# A tibble: 9 x 5
# Groups:   ID [3]
#     ID Invoice Date_of_Invoice paid_or_not flag 
#  <int>   <int> <date>          <chr>       <chr>
#1     1       1 2019-09-30      no          good 
#2     1       2 2019-10-30      no          good 
#3     1       3 2019-11-30      yes         good 
#4     2       1 2019-10-31      yes         bad  
#5     2       2 2019-11-30      no          bad  
#6     2       3 2019-12-31      no          bad  
#7     3       1 2019-07-31      no          good 
#8     3       2 2019-09-30      yes         good 
#9     3       3 2019-12-31      no          good 

数据

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Date_of_Invoice = c("09/30/2019", 
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019", 
"7/31/2019", "9/30/2019", "12/31/2019"), paid_or_not = c("no", 
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA, 
-9L))

2
投票

假设您已经订购了Date of Invoice.,那么这里是使用ave的基本R解决方案>

df$`good or band.` <- ave(df$`paid or not.`,df$ID., FUN = function(v) ifelse(which(v=="yes")==1,"bad","good"))

诸如此类

> df
  ID. Invoice. Date of Invoice. paid or not. good or band.
1   1        1       09/30/2019           no          good
2   1        2       10/30/2019           no          good
3   1        3       11/30/2019          yes          good
4   2        1       10/31/2019          yes           bad
5   2        2       11/30/2019           no           bad
6   2        3       12/31/2019           no           bad
7   3        1        7/31/2019           no          good
8   3        2        9/30/2019          yes          good
9   3        3       12/31/2019           no          good

DATA

df <- structure(list(ID. = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice. = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), `Date of Invoice.` = c("09/30/2019", 
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019", 
"7/31/2019", "9/30/2019", "12/31/2019"), `paid or not.` = c("no", 
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA, 
-9L))
© www.soinside.com 2019 - 2024. All rights reserved.