R 中的 findInterval() 行为异常

Question

我想我在 R 函数中发现了一个奇怪的行为

base::findInterval()

。

我有一个长向量

，其累积值在 (0,1] 范围内。为了简单起见，我们可以将

定义为

ones <- rep(1, 2000)
p <- cumsum(ones)/sum(ones)
head(p)
#> [1] 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030
tail(p)
#> [1] 0.9975 0.9980 0.9985 0.9990 0.9995 1.0000

^{创建于 2023-09-28，使用 reprex v2.0.2}

问题

我需要找到那些值的

的索引，这些值的间隔根据断点向量

probs

而变化：

probs  <- seq(1/nbins,1, 1/nbins)

，其中

nbins

（在我的例子中为100）是中断的数量。

我使用

findInterval()

来完成此任务，我发现对于

的某些部分，它可以正确识别间隔，但对于其他部分，则不能。

示例

在下面的示例中，我创建了向量

、

和

，它们是

的子集，但足以显示问题。请注意，

findInterval()

与

和

配合使用效果很好，但它会错误地识别

中的间隔。你们有人知道为什么吗？我是不是做错了什么？

# Number of bins
nbins <- 100

# Cuts in the cumulative distribution
probs  <- seq(1/nbins,1, 1/nbins)

# subsets of the large cumulative distribution
x <- c(0.0095, 0.01, 0.0105, 0.011)
y <- c(0.0495, 0.05, 0.0505, 0.051)
z <- c(0.0595, 0.06, 0.0605, 0.061)

# These two work fine
findInterval(x,probs)
#> [1] 0 1 1 1
findInterval(y,probs)
#> [1] 4 5 5 5

# this should be c(5, 6, 6, 6) NOT c(5, 5, 6, 6)
findInterval(z,probs)
#> [1] 5 5 6 6

^{创建于 2023-09-28，使用 reprex v2.0.2}

寓意

请注意，在这个例子中，

是一个简单的向量，我们事先知道实际的索引。但在实际情况下，我们不知道

findInterval()

是否正常工作。

旁注

作为旁注，我使用下面的代码来识别索引。我认为如果

findInterval()

工作正常的话应该可以正常工作。你知道是否还有其他超级有效的方法来查找索引？

# these are the indexes
ind <-
   findInterval(p, probs) |>
    diff() |> {
    \(.) which(. == 1)
     }() + 1

Answer 1

我认为这是 R 中浮点问题的一个示例。下面您可以看到，尽管

probs[6]

打印为

0.06

，但它并不等于该数字：

nbins <- 100

# Cuts in the cumulative distribution
probs  <- seq(1/nbins,1, 1/nbins)

probs[6]
#> [1] 0.06
probs[6] == .06
#> [1] FALSE

如果将

probs

中的概率四舍五入到小数点后 15 位，您将得到预期的结果。

# subsets of the large cumulative distribution
x <- c(0.0095, 0.01, 0.0105, 0.011)
y <- c(0.0495, 0.05, 0.0505, 0.051)
z <- c(0.0595, 0.06, 0.0605, 0.061)

findInterval(z, probs)
#> [1] 5 5 6 6
findInterval(z, round(probs, 15))
#> [1] 5 6 6 6

^{创建于 2023-09-28，使用 reprex v2.0.2}

R 中的 findInterval() 行为异常

问题描述投票：0回答：1

问题

示例

寓意

旁注

1个回答

最新问题

R 中的 findInterval() 行为异常

问题描述 投票：0回答：1

问题

示例

寓意

旁注

1个回答

最新问题

问题描述投票：0回答：1