我在R中使用cut
生成一组级别,例如:说0到1之间的小数值,分为0.1个分档:
> frac <- cut(c(0, 1), breaks=10)
> levels(frac)
[1] "(-0.001,0.1]" "(0.1,0.2]" "(0.2,0.3]" "(0.3,0.4]" "(0.4,0.5]"
[6] "(0.5,0.6]" "(0.6,0.7]" "(0.7,0.8]" "(0.8,0.9]" "(0.9,1]"
给定v
包含[0.0, 1.0]
之间连续值的向量v
,如何计算levels(frac)
中每个级别内v
中元素的频率?
我可以自定义中断的数量和/或我创建级别的时间间隔,所以我正在寻找一种方法来使用标准R命令,这样我就可以构建一个两列数据框:一列用于作为因子的水平,以及> table(frac)
frac
(-0.001,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6]
1 0 0 0 0 0
(0.6,0.7] (0.7,0.8] (0.8,0.9] (0.9,1]
0 0 0 1
中该元素的总元素的分数或百分比值的第二列。
注意:以下内容不起作用:
cut
如果我直接在v
上使用cut
,那么当我在不同的向量上运行frac = seq(0,1,by=0.1)
ranges = paste(head(frac,-1), frac[-1], sep=" - ")
freq = hist(v, breaks=frac, include.lowest=TRUE, plot=FALSE)
data.frame(range = ranges, frequency = freq$counts)
时,我得不到相同的水平,因为值的范围 - 它们的最小值和最大值 - 在任意向量之间会有所不同,所以尽管我可能有相同数量的休息时间,等级间隔将不相同。
我的目标是采用不同的向量并将它们分成同一组级别。希望这有助于澄清我的问题。谢谢你的帮助。
frac
修改table
实际代表您所需的间隔,然后使用x = runif(100) # For example.
frac = cut(x, breaks = seq(0, 1, 0.1))
table(frac)
函数:
frac
(0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8]
14 9 8 10 8 12 7 7
(0.8,0.9] (0.9,1]
16 9
结果:
c(0, 1)
将极端v
引入cut
,然后使用相同的library(dplyr)
#dummy data
set.seed(1)
v <- round(runif(7), 2)
#result
data.frame(v,
vFrac = cut(c(0, 1, v), breaks = 10)[-c(1, 2)]) %>%
group_by(vFrac) %>%
mutate(vFreq = n())
# Source: local data frame [10 x 3]
# Groups: vFrac [8]
#
# v vFrac vFreq
# <dbl> <fctr> <int>
# 1 0.27 (0.2,0.3] 1
# 2 0.37 (0.3,0.4] 1
# 3 0.57 (0.5,0.6] 1
# 4 0.91 (0.9,1] 2
# 5 0.20 (0.1,0.2] 1
# 6 0.90 (0.8,0.9] 1
# 7 0.94 (0.9,1] 2
:
v<-data.frame(v=runif(100,0,1))
library(plyr)
v$x<-findInterval(v$v,seq(0,1,by=0.1))*0.1
ddply(v, .(x), summarize, n=length(x))
使用findInterval而不是cut:
frac = seq(0, 1, 0.1)
set.seed(42); v = rnorm(10, 0.5, 0.2)
sapply(1:(length(frac)-1), function(i) sum(frac[i]<v & frac[i+1]>=v))
#[1] 0 0 0 1 3 2 1 1 1 1
qazxswpoi