如何计算覆盖度量来评估变化点检测方法的性能[关闭]

问题描述 投票:0回答:1

覆盖度量

(CM)
评估检测到的变化点生成的数据段如何与真实变化点形成的地面实况段对齐。

实施CM的理论

The formula to calculate CM

我想计算 10 个实现生成的数据

CM
,它将通过取每个实现的 CM 的平均值来获得。

我尝试编写适用于一种实现的代码。我还将其扩展了 10 个实现,但它解决了确定估计变化点和真实变化点产生的长度段的问题。

“DeCAFS”变化点检测方法的代码
CM
实现一种实现

library(DeCAFS)
set.seed(2001)
x=rep(c(0,2,-2,2,-2), c(200,200,200,200, 200))+rnorm(1000) # data with change points (CP)
TCP=c(200,400,600,800)   # True CP locations
splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% (pos+1)))) 
split_by_TCP=splitAt(x, TCP) # partition the data into segments based on TPC

estCP=DeCAFS(x,warningMessage = FALSE)$changepoints  #estimate CPs using DeCAFS method
split_by_estCP=splitAt(x, estCP)  # partition the data into segments based on estimated CPs

# determine the intersection between segments produced from true CPs and estimated CPs
intersec=(Map(\(split_by_TCP, split_by_estCP) intersect(split_by_TCP, split_by_estCP), split_by_TCP, split_by_estCP))

#computes the cardinality of the intersection
cardinality_intersection=sapply(intersec,length)

# determine the union between segments produced from true CPs and estimated CPs
unionn=(Map(\(split_by_TCP, split_by_estCP) union(split_by_TCP, split_by_estCP), split_by_TCP, split_by_estCP))

# computes the cardinality of the union
cardinality_union=sapply(unionn, length)

#Calculate CM
CM=(1/length(x))*sum(sapply(split_by_TCP, length)*(max((cardinality_intersection)/(cardinality_intersection))))

print(CM)

对于10次实现,我使用了以下代码来实现
DeCAFS
方法,但是失败了

set.seed(1111)
y=as.data.frame(replicate(10,rep(c(0,2,-2,2,-2), c(200,200,200,200, 200))+rnorm(1000)))
decafs = function (x) DeCAFS(x, warningMessage = FALSE)$changepoints
mw=sapply(y, decafs) #stores estimated CPs
C=rep(c(200,400,600,800),10) # replicate True CPs for 10 realisations 
ps=head(seq(4, 40, by=4),-1)  # positions to split True CPs
splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% (pos+1)))) 
s=splitAt(C, ps)  # split True CPs for 10 realisations
seg_TCP=mapply(splitAt,x=y, pos=s) #stores segments from true CPs for 10 realisations
seg_estCPs=mapply(splitAt,x=y,pos=mw) # stores segments from estimated CPs for 10 realisations
r
1个回答
1
投票

引入的函数适用于多个数据系列,按列收集在

data.frame

设置

这是您的数据生成过程。

set.seed(0508)
n = rep(200L, 5L)
tcp = cumsum(n)[-length(n)]
y = as.data.frame(replicate(10L, rep(c(0L, 2L, -2L, 2L, -2L), n) + rnorm(1e3))) |> 
  `colnames<-`(paste0("data", 1:10))

实现的理论

enter image description here

注意。不太清楚

max
是如何工作的。

实施
cm()

cm = \(X, tcp) {
  # X := data frame, one data series per column 
  # tcp := true change points 
  stopifnot(is.data.frame(X))
  stopifnot(is.vector(tcp))
  ecp = lapply(X, \(j) DeCAFS::DeCAFS(j, warningMessage = FALSE)$changepoints) 
  lecp = lengths(ecp)
  cat("Number of estimated changepoints per data series:\n", lecp, "\n")
  cat("Calculation will be performed for those where number of (#) estimated change points = # tcp.\n")
  ecp = ecp[cols <- which(lecp == length(tcp))]
  
  # https://stackoverflow.com/a/16358095/20002111
  splitAt = \(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))
  x_spliton_tcp = lapply(X[cols], \(j) splitAt(j, tcp))
  x_spliton_ecps = Map(\(x, p) splitAt(x, p), X[cols], ecp)
  
  cA = sapply(x_spliton_tcp, \(x) lengths(x))
  M = mapply(\(x, y) 
             mapply(\(x, y) length(intersect(x, y)) / length(union(x, y)), x, y), 
             x_spliton_tcp, x_spliton_ecps)
  colSums(M * cA) / nrow(y)
}

第一次尝试

给予

cm(y, tcp)
Number of estimated changepoints per data series:
 4 4 3 3 3 4 4 3 4 4 
Calculation will be performed for those where number of (#) estimated change points = # tcp.
    data1     data2     data6     data7     data9    data10 
0.9980050 1.0000000 0.9960100 0.9920495 0.9980050 0.9980050 

注意

  • 我从头开始写的,没有仔细考虑程序。我相信第一次修订将显着改进和改变代码。 我稍后回来。
  • 看起来
    DeCAFS::DeCAFS()
    及其底层算法没有提供将“估计变化点”的数量设置为固定值的选项。目前,估计变化点的数量偏离
    真实变化点的数量
    的数据系列(y的列)被排除在外。乍一看,这对我来说似乎是一个糟糕的过程。
© www.soinside.com 2019 - 2024. All rights reserved.