我有一个带字符的数据,我想把一个数据转换成存在不存在的数据形式,如果存在任何像变量这样的字符,我想把它变成一个,如果有空的,我想变成0.
df <- data.frame(
A = c("G1","G2","G3","G4","G5","G6","G7","G8","G9","G10"),
B = c("A", "", "A", "", "G", "B", "C", "", "", "" ),
C = c("B", "", "Z", "", "", "", "", 'B', "C", "" ),
D = c("Z", "D", "", "", "", "", "", "", "", "D"),
E = c("A", "E", "B", "A","", "", "", "", "", "")
)
输出看起来像这样
df <- data.frame(
A = c("G1", "G2", "G3", "G4", "G5","G6","G7", "G8", "G9","G10"),
B = c(1, 0, 1, 0, 1, 1, 1, 0, 0, 0),
C = c(1, 0, 1, 0, 0, 0, 0, 1, 1, 0),
D = c(1, 1, 0, 0, 0, 0, 0, 0, 0, 1),
E = c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0))
提前致谢
我们可以使用
!=
并在 base R
中将其强制转换为二进制
df[-1] <- +(df[-1] != "")
-输出
> df
A B C D E
1 G1 1 1 1 1
2 G2 0 0 1 1
3 G3 1 1 0 1
4 G4 0 0 0 1
5 G5 1 0 0 0
6 G6 1 0 0 0
7 G7 1 0 0 0
8 G8 0 1 0 0
9 G9 0 1 0 0
10 G10 0 0 1 0
或与
tidyverse
library(dplyr) # version >= 1.1.0
df %>%
mutate(across(-A, ~ case_match(.x, "" ~ 0, .default = 1)))
-输出
A B C D E
1 G1 1 1 1 1
2 G2 0 0 1 1
3 G3 1 1 0 1
4 G4 0 0 0 1
5 G5 1 0 0 0
6 G6 1 0 0 0
7 G7 1 0 0 0
8 G8 0 1 0 0
9 G9 0 1 0 0
10 G10 0 0 1 0
使用 base R,您可以就地替换 data.frame
df[, -1] <- lapply(df[, -1], function(x) ifelse(nchar(x)>0,1,0))
或者使用
dplyr
你可以改变data.frame来创建一个新的
library(dplyr)
df %>%
mutate(across(-A, ~if_else(nchar(.)>0, 1, 0)))
我们也可以用
nzchar
> df[-1] <- +nzchar(as.matrix(df[-1]))
> df
A B C D E
1 G1 1 1 1 1
2 G2 0 0 1 1
3 G3 1 1 0 1
4 G4 0 0 0 1
5 G5 1 0 0 0
6 G6 1 0 0 0
7 G7 1 0 0 0
8 G8 0 1 0 0
9 G9 0 1 0 0
10 G10 0 0 1 0