这是我第一次使用elseif
。我希望创建一个新列mobile$tenuredate
(以月为单位),并试图找出产生NA值的代码的问题。
结果
mobile$status == 'active'
行给出mobile$tenuredate
的NA值(它们不应为NA)。
mobile$status == 'stopped'
行给出mobile$tenuredate
的有效值。
下面是代码
mobile$tenuredate = if (mobile$status=="stopped") {
round(difftime(mobile$EFFECTIVEDATE, mobile$STARTDATE, units="weeks") / 4.348125)
} else if ((mobile$status == "active") && (mobile$difftemp >= 0)) {
round(difftime(mobile$CONTRACTENDDATE, mobile$STARTDATE, units="weeks") / 4.348125)
} else {
round(difftime(mobile$CUTOFFDATE, mobile$STARTDATE, units="weeks") / 4.348125)
}
Data file in CSV available here
这里是一个示例数据框。
structure(list(STARTDATE = structure(c(11413, 11639, 11953, 12212,
11335, 12050, 12142, 11225, 12176, 11386), class = "Date"), STOPDATE = structure(c(11436,
12079, NA, 12225, 11345, 12124, 12226, 11999, 12176, 11758), class = "Date"),
EFFECTIVEDATE = structure(c(11436, 12079, NA, 12225, 11345,
12124, 12226, 11999, 12176, 11758), class = "Date"), CONTRACTENDDATE = structure(c(11778,
12004, 12318, 12578, 11700, 12415, 12508, 11977, 12542, 11751
), class = "Date"), CUTOFFDATE = structure(c(12273, 12273,
12273, 12273, 12273, 12273, 12273, 12273, 12273, 12273), class = "Date"),
status = c("stopped", "stopped", "active", "stopped", "stopped",
"stopped", "stopped", "stopped", "stopped", "stopped"), tenuredate = structure(c(1,
14, NA, 0, 0, 2, 3, 25, 0, 12), class = "difftime", units = "weeks")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
提前感谢。
if
要求其条件的长度为1,而您正在提供向量。逻辑替换是使用ifelse
,但是ifelse
是一个众所周知的问题(在退伍军人当中),它将删除该类,因此您的Date
或difftime
列将变为numeric
,并且您必须重铸它们。 (这不是世界末日,但让我们暂时保持当前状态。)
mobile$tenuredate <- NULL # just to clean up your previous attempt, otherwise not needed
mobile$usedate <- Sys.Date()[NA] # all NAs are not created equal ...
ind <- mobile$status == "stopped"
mobile$usedate[ind] <- mobile$EFFECTIVEDATE[ind]
ind <- (mobile$status == "active") && (mobile$difftemp >= 0)
mobile$usedate[ind] <- mobile$CONTRACTENDDATE[ind]
ind <- is.na(mobile$usedate)
mobile$usedate[ind] <- mobile$CUTOFFDATE[ind]
mobile
# # A tibble: 10 x 7
# STARTDATE STOPDATE EFFECTIVEDATE CONTRACTENDDATE CUTOFFDATE status usedate
# <date> <date> <date> <date> <date> <chr> <date>
# 1 2001-04-01 2001-04-24 2001-04-24 2002-04-01 2003-08-09 stopped 2001-04-24
# 2 2001-11-13 2003-01-27 2003-01-27 2002-11-13 2003-08-09 stopped 2003-01-27
# 3 2002-09-23 NA NA 2003-09-23 2003-08-09 active 2003-08-09
# 4 2003-06-09 2003-06-22 2003-06-22 2004-06-09 2003-08-09 stopped 2003-06-22
# 5 2001-01-13 2001-01-23 2001-01-23 2002-01-13 2003-08-09 stopped 2001-01-23
# 6 2002-12-29 2003-03-13 2003-03-13 2003-12-29 2003-08-09 stopped 2003-03-13
# 7 2003-03-31 2003-06-23 2003-06-23 2004-03-31 2003-08-09 stopped 2003-06-23
# 8 2000-09-25 2002-11-08 2002-11-08 2002-10-17 2003-08-09 stopped 2002-11-08
# 9 2003-05-04 2003-05-04 2003-05-04 2004-05-04 2003-08-09 stopped 2003-05-04
# 10 2001-03-05 2002-03-12 2002-03-12 2002-03-05 2003-08-09 stopped 2002-03-12
在这里暂停并验证所有usedate
值都来自适当的列,可能会很有用。
我使用usedate
作为中间值有两个原因:(1)进行验证;和(2),因为您要为其余部分做相同的数学运算...那么为什么要在三个位置保持相同的数学运算,只需执行一次即可。当然,还有其他方法可以做到这一点。
mobile$tenuredate <- round(difftime(mobile$usedate, mobile$STARTDATE, units = "weeks") / 4.348125)
mobile
# # A tibble: 10 x 8
# STARTDATE STOPDATE EFFECTIVEDATE CONTRACTENDDATE CUTOFFDATE status usedate tenuredate
# <date> <date> <date> <date> <date> <chr> <date> <drtn>
# 1 2001-04-01 2001-04-24 2001-04-24 2002-04-01 2003-08-09 stopped 2001-04-24 1 weeks
# 2 2001-11-13 2003-01-27 2003-01-27 2002-11-13 2003-08-09 stopped 2003-01-27 14 weeks
# 3 2002-09-23 NA NA 2003-09-23 2003-08-09 active 2003-08-09 11 weeks
# 4 2003-06-09 2003-06-22 2003-06-22 2004-06-09 2003-08-09 stopped 2003-06-22 0 weeks
# 5 2001-01-13 2001-01-23 2001-01-23 2002-01-13 2003-08-09 stopped 2001-01-23 0 weeks
# 6 2002-12-29 2003-03-13 2003-03-13 2003-12-29 2003-08-09 stopped 2003-03-13 2 weeks
# 7 2003-03-31 2003-06-23 2003-06-23 2004-03-31 2003-08-09 stopped 2003-06-23 3 weeks
# 8 2000-09-25 2002-11-08 2002-11-08 2002-10-17 2003-08-09 stopped 2002-11-08 25 weeks
# 9 2003-05-04 2003-05-04 2003-05-04 2004-05-04 2003-08-09 stopped 2003-05-04 0 weeks
# 10 2001-03-05 2002-03-12 2002-03-12 2002-03-05 2003-08-09 stopped 2002-03-12 12 weeks
((一旦知道您不需要它,mobile$usedate <- NULL
。]
[如果您使用任何tidyverse软件包,则可以更简洁地使用case_when
完成此操作:
library(dplyr)
as_tibble(mobile) %>%
mutate(
usedate = case_when(
status == "stopped" ~ EFFECTIVEDATE,
(status == "active") && (difftemp >= 0) ~ CONTRACTENDDATE,
TRUE ~ CUTOFFDATE
),
tenuredate = round(difftime(usedate, STARTDATE, units = "weeks") / 4.348125)
)
或data.table
解决方案:
library(data.table)
as.data.table(mobile)[
, usedate := Sys.Date()[NA] ][
status == "stopped", usedate := EFFECTIVEDATE ][
(status == "active") && (difftemp >= 0), usedate := CONTRACTENDDATE ][
is.na(usedate), usedate := CUTOFFDATE ][
, tenuredate := round(difftime(usedate, STARTDATE, units = "weeks") / 4.348125) ]
如果将data.table
与magrittr
的管道结合在一起,则可能会更容易理解:
library(data.table)
library(magrittr)
as.data.table(mobile) %>%
.[ , usedate := Sys.Date()[NA] ] %>%
.[ status == "stopped", usedate := EFFECTIVEDATE ] %>%
.[ (status == "active") && (difftemp >= 0), usedate := CONTRACTENDDATE ] %>%
.[ is.na(usedate), usedate := CUTOFFDATE ] %>%
.[ , tenuredate := round(difftime(usedate, STARTDATE, units = "weeks") / 4.348125) ]
参考我的断言,ifelse
放弃了课程: