我有这个数据框
data<-data.frame(class1=c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B"),
class2=c(1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8),
observations=c(444,475, 531,560,650,668,705,717,456,876,123,47,249,180,500,654))
并且需要基于“class2”的2个单位间隔创建一个新的分类变量“class3”。如果 class2 介于 1 和 2 之间,则“class3”为 1,依此类推。 “class2”是连续的。
我可以使用定义的间隔创建一个新表,然后加入。
intv<-data.frame(class2=c(1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8),
class3=c(1,1,2,2,3,3,4,4,1,1,2,2,3,3,4,4))
data.2<-left_join(data,intv,by = join_by(class2))
> data.2
class1 class2 observations class3
1 A 1 444 1
2 A 1 444 1
3 A 2 475 1
4 A 2 475 1
5 A 3 531 2
6 A 3 531 2
7 A 4 560 2
8 A 4 560 2
9 A 5 650 3
10 A 5 650 3
11 A 6 668 3
12 A 6 668 3
13 A 7 705 4
14 A 7 705 4
15 A 8 717 4
16 A 8 717 4
17 B 1 456 1
18 B 1 456 1
19 B 2 876 1
20 B 2 876 1
21 B 3 123 2
22 B 3 123 2
23 B 4 47 2
24 B 4 47 2
25 B 5 249 3
26 B 5 249 3
27 B 6 180 3
28 B 6 180 3
29 B 7 500 4
30 B 7 500 4
31 B 8 654 4
32 B 8 654 4
但是真实的数据框有很多观察结果,所以需要很多时间。
是否有一个功能可以自动执行此操作,仅指示间隔大小?
尝试这样:
data$class3 <- cut(data$class2, breaks = seq(0, max(data$class2)+1, by = 2), labels = FALSE)
这是你想要的吗?
findClass = \(x) {
n = max(x)
if((n %% 2L) == 0L) n = n + 1L
findInterval(x, seq(1L, n, 2L))
}
给予
> findClass(data$class2)
[1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
对于包含的示例数据,除以 2 并向上舍入应该足够了:
data<-data.frame(class1=c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B"),
class2=c(1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8),
observations=c(444,475, 531,560,650,668,705,717,456,876,123,47,249,180,500,654))
data$class3 <- ceiling(data$class2 / 2)
# if you need it to be categorical / factor :
data$class3_fct <- as.factor(data$class3)
head(data, n = 10)
#> class1 class2 observations class3 class3_fct
#> 1 A 1 444 1 1
#> 2 A 2 475 1 1
#> 3 A 3 531 2 2
#> 4 A 4 560 2 2
#> 5 A 5 650 3 3
#> 6 A 6 668 3 3
#> 7 A 7 705 4 4
#> 8 A 8 717 4 4
#> 9 B 1 456 1 1
#> 10 B 2 876 1 1
str(data)
#> 'data.frame': 16 obs. of 5 variables:
#> $ class1 : chr "A" "A" "A" "A" ...
#> $ class2 : num 1 2 3 4 5 6 7 8 1 2 ...
#> $ observations: num 444 475 531 560 650 668 705 717 456 876 ...
#> $ class3 : num 1 1 2 2 3 3 4 4 1 1 ...
#> $ class3_fct : Factor w/ 4 levels "1","2","3","4": 1 1 2 2 3 3 4 4 1 1 ...
创建于 2024-01-19,使用 reprex v2.0.2