目前我在一个项目上工作,陷入了一个问题。我必须用不同列中的两个条件替换列的行值。假设:
x y m n
1 200P Jan Perm
1 200T Feb Temp
1 300P Jan Perm
2 200T Feb Temp
2 300T Feb Temp
3 300P Jan Perm
3 400P Jan Perm
我想基于x和y更改列n的值。
for each x
check the value of y and n, if the first value of y with T is
Perm/Temp. Replace the rest of the values of unique x rows to that
value.
我试过但是当我执行代码时,它将所有Temp替换为Perm或Perm替换为Temp。但我希望它只更改该唯一x的行的值。有人可以帮我这个。我希望我的输出像:
x y m n
1 200P Jan Temp
1 200T Feb Temp
1 300P Jan Temp
2 200T Feb Temp
2 300T Feb Temp
3 300P Jan Perm
3 400P Jan Perm
我试图用另一个具有不同条件的数据集来练习。例如:
a b c d
1 1 0.4 Minor
1 1 0.4 Minor
1 4 0.2 Minor
1 2 2.4 Major
2 4 0.2 Minor
3 1 0.4 Minor
3 4 0.2 Minor
3 4 4.2 Major
我试图在b列中将4替换为1,条件是如果在c列中它是0.2。如果4和0.4位于同一行,则将4替换为1。
我相信以下代码可以满足您的需求。
它创建了一个新的列n2
,其值为n
,对应于T
中y
的第一次出现。
fun <- function(DF){
i <- grep("T", DF$y)[1]
DF$n2 <- DF$n
if(!is.na(i)) DF$n2[seq_len(nrow(DF))[-seq_len(i - 1)]] <- DF$n[i]
DF$n2
}
res <- dat # work with a copy
res$n2 <- unlist(lapply(split(dat[c(1:2, 4)], dat$x), FUN = fun))
res
# x y m n n2
#1 1 200P Jan Perm Perm
#2 1 200T Feb Temp Temp
#3 1 300P Jan Perm Temp
#4 2 200T Feb Temp Temp
#5 2 300T Feb Temp Temp
#6 3 300P Jan Perm Perm
#7 3 400P Jan Perm Perm
如果你不想要那个新专栏,那就去吧
res$n <- res$n2
res <- res[-ncol(res)]
编辑。
显然我的原始代码是对的。以下是OP在上次评论中要求的内容。
fun2 <- function(DF){
i <- grep("T", DF$y)[1]
DF$n2 <- if(!is.na(i)) DF$n[i] else DF$n
DF$n2
}
res2 <- dat # work with a copy
res2$n2 <- unlist(lapply(split(dat[c(1:2, 4)], dat$x), FUN = fun))
res2
# x y m n n2
#1 1 200P Jan Perm Temp
#2 1 200T Feb Temp Temp
#3 1 300P Jan Perm Temp
#4 2 200T Feb Temp Temp
#5 2 300T Feb Temp Temp
#6 3 300P Jan Perm Perm
#7 3 400P Jan Perm Perm
数据。
dat <- read.table(text = "
x y m n
1 200P Jan Perm
1 200T Feb Temp
1 300P Jan Perm
2 200T Feb Temp
2 300T Feb Temp
3 300P Jan Perm
3 400P Jan Perm
", header = TRUE)
编辑2。
根据您的问题编辑中的条件,它更简单,使用逻辑索引。
请注意,在您的编辑中,首先您要将列b
值从4更改为if c
列是0.2
但是如果列c
是0.4
,则说要更改它。下面的代码使用0.2
。
inx <- dat2$b == 4 & dat2$c == 0.2
dat2$b[inx] <- 1
数据2。
dat2 <- read.table(text = "
a b c d
1 1 0.4 Minor
1 1 0.4 Minor
1 4 0.2 Minor
1 2 2.4 Major
2 4 0.2 Minor
3 1 0.4 Minor
3 4 0.2 Minor
3 4 4.2 Major
", header = TRUE)
我们也可以试试data.table
library(data.table)
i1 <- setDT(df1)[, {i1 <- grepl("T$", y)
if(any(i1)) .I[which.max(i1):.N] } , x]$V1
要么
i1 <- setDT(df1)[, .I[cumsum(grepl("T$", y))!=0], x]$V1
df1[i1, n := first(n), x]
df1
# x y m n
#1: 1 200P Jan Perm
#2: 1 200T Feb Temp
#3: 1 300P Jan Temp
#4: 2 200T Feb Temp
#5: 2 300T Feb Temp
#6: 3 300P Jan Perm
#7: 3 400P Jan Perm
df1 <- structure(list(x = c(1L, 1L, 1L, 2L, 2L, 3L, 3L), y = c("200P",
"200T", "300P", "200T", "300T", "300P", "400P"), m = c("Jan",
"Feb", "Jan", "Feb", "Feb", "Jan", "Jan"), n = c("Perm", "Temp",
"Perm", "Temp", "Temp", "Perm", "Perm")), .Names = c("x", "y",
"m", "n"), class = "data.frame", row.names = c(NA, -7L))
您可以使用dplyr::first
查找具有1st
值的y
T
,然后用找到的行中的值替换n
的所有值。
library(dplyr)
df %>% group_by(x) %>%
mutate(n = ifelse(!is.na(first(grep("T$",y))),
n[first(grep("T$",y))], n )) %>%
as.data.frame()
# x y m n
# 1 1 200P Jan Temp
# 2 1 200T Feb Temp
# 3 1 300P Jan Temp
# 4 2 200T Feb Temp
# 5 2 300T Feb Temp
# 6 3 300P Jan Perm
# 7 3 400P Jan Perm
数据:
df <- read.table(text =
"x y m n
1 200P Jan Perm
1 200T Feb Temp
1 300P Jan Perm
2 200T Feb Temp
2 300T Feb Temp
3 300P Jan Perm
3 400P Jan Perm",
header = TRUE, stringsAsFactors = FALSE)