我对 R 有疑问,因为我需要对列表执行一些操作并使用列表中的这些值创建一个新的数据框。 如果我使用 for 循环,则需要很长时间才能完成。 并且不知道如何避免 for 循环,以及如何在没有 for 循环的情况下执行“if + case_when”。
在下面的代码中有注释来解释我做了什么以及发生了什么。
非常感谢!
#search in all rows of list "total"
for(i in 1:nrow(total)) {
#Take with total$Cad[[i]] a value from another list
val1 <- posdi[posdi$cad == str_to_upper(total$Cad[[i]]),]
#Check if "font" value from val1 is equal to "Taake" and take the value
val2 <- val1[val1$font == "Taake",]
#Format date value
thedate <- as.numeric(format(as.Date(total$TheDate[[i]], format="%Y-%m-%d"), '%Y%m%d'))
#And here comes where I can't continue easily. I want to do an IF and make a different
#case_when if the result is between 1 and 5 or between 6 and 7
if(total$dia[[i]] >= 1 & total$dia[[i]] <= 5) {
fran = case_when(
total$secs[[i]]>=0 & total$secs[[i]]<1.5 ~ 1,
total$secs[[i]]>=1.5 & total$secs[[i]]<4 ~ 2,
total$secs[[i]]>=4 & total$secs[[i]]<8 ~ 3,
total$secs[[i]]>=8 & total$secs[[i]]<10 ~ 4)
} else {
fran = case_when(
total$secs[[i]]>=0 & total$secs[[i]]<1.5 ~ 5,
total$secs[[i]]>=1.5 & total$secs[[i]]<4 ~ 6,
total$secs[[i]]>=4 & total$secs[[i]]<8 ~ 7,
total$secs[[i]]>=8 & total$secs[[i]]<10 ~ 8)
}
#and finally, add that "fran" value, those three from the beggining and some from total list to a new dataframe
datosTel[nrow(datosTel) + 1,] = c(val2$cad, str_to_upper(total$Camp[[i]]), total$numsem[[i]], thedate, total$diasem[[i]], fran, 0)
}
#It works with the "for" loop, but it take so much time (it goes one by one and the list has more than 200K rows).
#How can I do it without that for loop and make the "if + case_when" correctly?
再次感谢,祝您有美好的一天
前面说了,我的问题是FOR循环和FOR里面的IF和CASE_WHEN,因为没有循环不知道怎么办
循环中的代码只涉及当前元素 (
[[i]]
),并且您正在执行的所有操作默认情况下都是向量化的(除了 if
,但我们可以直接用 if_else
替换它)。
所以您可以用
mutate
或 transmute
语句替换整个循环(他们做同样的事情,transmute
只是不保留现有的列,所以它似乎更适合您的情况)。
此外,您可以通过合并两个分支并添加取决于
if
的偏移量来简化total$dia
。
最后,你的
case_when
表情恰好可以表达为findInterval
表情。
在下文中,我假设
datosTel
是循环之前的空表,并且我还对您可能需要调整的列名做了一些假设。
datosTel = total %>%
transmute(
cad = posdi$cad[posdi$cad == str_to_upper(Cad) & posdi$font == "Taake"],
Camp = str_to_upper(Camp),
numsem = numsem,
thedate = as.numeric(format(as.Date(TheDate, format="%Y-%m-%d"), '%Y%m%d')),
diasem = diasem,
offset = if_else(dia >= 1 & dia <= 5, 0, 4),
fran = offset + findInterval(secs, c(0, 1.5, 4, 8, 10, Inf)),
LAST_COLUMN = 0
) %>%
select(-offset)
(将
LAST_COLUMN
替换为实际的列名称。)
findInterval
调用等同于:
case_when(
secs >= 0 & secs < 1.5 ~ 1,
secs >= 1.5 & secs < 4 ~ 2,
secs >= 4 & secs < 8 ~ 3,
secs >= 8 & secs < 10 ~ 4
)