覆盖度量
(CM)
评估检测到的变化点生成的数据段如何与真实变化点形成的地面实况段对齐。
我想计算 10 个实现生成的数据
CM
,它将通过取每个实现的 CM 的平均值来获得。
我尝试编写适用于一种实现的代码。我还将其扩展了 10 个实现,但它解决了确定估计变化点和真实变化点产生的长度段的问题。
CM
实现一种实现library(DeCAFS)
set.seed(2001)
x=rep(c(0,2,-2,2,-2), c(200,200,200,200, 200))+rnorm(1000) # data with change points (CP)
TCP=c(200,400,600,800) # True CP locations
splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% (pos+1))))
split_by_TCP=splitAt(x, TCP) # partition the data into segments based on TPC
estCP=DeCAFS(x,warningMessage = FALSE)$changepoints #estimate CPs using DeCAFS method
split_by_estCP=splitAt(x, estCP) # partition the data into segments based on estimated CPs
# determine the intersection between segments produced from true CPs and estimated CPs
intersec=(Map(\(split_by_TCP, split_by_estCP) intersect(split_by_TCP, split_by_estCP), split_by_TCP, split_by_estCP))
#computes the cardinality of the intersection
cardinality_intersection=sapply(intersec,length)
# determine the union between segments produced from true CPs and estimated CPs
unionn=(Map(\(split_by_TCP, split_by_estCP) union(split_by_TCP, split_by_estCP), split_by_TCP, split_by_estCP))
# computes the cardinality of the union
cardinality_union=sapply(unionn, length)
#Calculate CM
CM=(1/length(x))*sum(sapply(split_by_TCP, length)*(max((cardinality_intersection)/(cardinality_intersection))))
print(CM)
DeCAFS
方法,但是失败了set.seed(1111)
y=as.data.frame(replicate(10,rep(c(0,2,-2,2,-2), c(200,200,200,200, 200))+rnorm(1000)))
decafs = function (x) DeCAFS(x, warningMessage = FALSE)$changepoints
mw=sapply(y, decafs) #stores estimated CPs
C=rep(c(200,400,600,800),10) # replicate True CPs for 10 realisations
ps=head(seq(4, 40, by=4),-1) # positions to split True CPs
splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% (pos+1))))
s=splitAt(C, ps) # split True CPs for 10 realisations
seg_TCP=mapply(splitAt,x=y, pos=s) #stores segments from true CPs for 10 realisations
seg_estCPs=mapply(splitAt,x=y,pos=mw) # stores segments from estimated CPs for 10 realisations
引入的函数适用于多个数据系列,按列收集在
data.frame
,仅。
这是您的数据生成过程。
set.seed(0508)
n = rep(200L, 5L)
tcp = cumsum(n)[-length(n)]
y = as.data.frame(replicate(10L, rep(c(0L, 2L, -2L, 2L, -2L), n) + rnorm(1e3))) |>
`colnames<-`(paste0("data", 1:10))
注意。不太清楚
max
是如何工作的。
cm()
cm = \(X, tcp) {
# X := data frame, one data series per column
# tcp := true change points
stopifnot(is.data.frame(X))
stopifnot(is.vector(tcp))
ecp = lapply(X, \(j) DeCAFS::DeCAFS(j, warningMessage = FALSE)$changepoints)
lecp = lengths(ecp)
cat("Number of estimated changepoints per data series:\n", lecp, "\n")
cat("Calculation will be performed for those where number of (#) estimated change points = # tcp.\n")
ecp = ecp[cols <- which(lecp == length(tcp))]
# https://stackoverflow.com/a/16358095/20002111
splitAt = \(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))
x_spliton_tcp = lapply(X[cols], \(j) splitAt(j, tcp))
x_spliton_ecps = Map(\(x, p) splitAt(x, p), X[cols], ecp)
cA = sapply(x_spliton_tcp, \(x) lengths(x))
M = mapply(\(x, y)
mapply(\(x, y) length(intersect(x, y)) / length(union(x, y)), x, y),
x_spliton_tcp, x_spliton_ecps)
colSums(M * cA) / nrow(y)
}
给予
cm(y, tcp)
Number of estimated changepoints per data series:
4 4 3 3 3 4 4 3 4 4
Calculation will be performed for those where number of (#) estimated change points = # tcp.
data1 data2 data6 data7 data9 data10
0.9980050 1.0000000 0.9960100 0.9920495 0.9980050 0.9980050
DeCAFS::DeCAFS()
及其底层算法没有提供将“估计变化点”的数量设置为固定值的选项。目前,估计变化点的数量偏离真实变化点的数量的数据系列(
y
的列)被排除在外。乍一看,这对我来说似乎是一个糟糕的过程。