我试图找到周期/缠绕正态分布(von Mises)的均值,方差和置信区间,但要在一个时间间隔内(与pi的传统间隔相反)。我看了一个关于堆栈溢出here的解决方案,它很接近,但是我不确定它到底在找什么。
我完全找到了我想要的here,它使用R(请参见下面的代码摘录)。我想在Python中复制它。
> data(timestamps)
> head(timestamps)
[1] "20:27:28" "21:08:41" "01:30:16" "00:57:04" "23:12:14" "22:54:16"
> library(lubridate)
> ts <- as.numeric(hms(timestamps)) / 3600
> head(ts)
[1] 20.4577778 21.1447222 1.5044444 0.9511111 23.2038889 22.9044444
> library(circular)
> ts <- circular(ts, units = "hours", template = "clock24")
> head(ts)
Circular Data:
[1] 20.457889 21.144607 1.504422 0.950982 23.203917 4.904397
> estimates <- mle.vonmises(ts)
> p_mean <- estimates$mu %% 24
> concentration <- estimates$kappa
> densities <- dvonmises(ts, mu = p_mean, kappa = concentration)
> alpha <- 0.90
> quantile <- qvonmises((1 - alpha)/2, mu = p_mean, kappa = concentration) %% 24
> cutoff <- dvonmises(quantile, mu = p_mean, kappa = concentration)
> time_feature <- densities >= cutoff
像库通告一样,python具有scipy.stats.vonmises包,但位于pi间隔内,而不是时间内。是否有其他可以帮助您的软件包?
我建立了一个python函数,该函数根据我的pdf公式来做我需要的事情>
希望这对社区有所帮助。如果我错了,请提供更正。
注意:这适用于[0,2pi]或360度范围内的值。
import pandas as pd import numpy as np from scipy.stats import chi2 def random_dates(start, end, n, unit='D', seed=None): if not seed: np.random.seed(0) ndays = (end - start).days + 1 return pd.to_timedelta(np.random.rand(n) * ndays, unit=unit) + start def vonmises(df, field): N = len(df[field]) s = np.sum(np.sin(df[field])) c = np.sum(np.cos(df[field])) sbar = (1/N)*s cbar = (1/N)*c if cbar > 0: if sbar >= 0: df['mu_vm'] = np.arctan(sbar/cbar) else: df['mu_vm'] = np.arctan(sbar/cbar) + 2*np.pi elif cbar < 0: df['mu_vm'] = np.arctan(sbar/cbar) + np.pi else: df['mu_vm'] = np.nan R = np.sqrt(c**2 + s**2) Rbar = (1/N)*R if Rbar < 0.53: kstar = 2*Rbar + Rbar**3 + 5*(Rbar**5)/6 elif Rbar >= 0.85: kstar = 1/(3*Rbar -4*(Rbar**2) + Rbar**3) else: kstar = -0.4 + 1.39*Rbar + 0.43/(1-Rbar) if N<=15: if kstar < 2: df['kappa_vm'] = np.max([kstar - 2/(N*kstar),0]) else: df['kappa_vm'] = ((N-1)**3)*kstar/(N*(N**2+1)) else: df['kappa_vm'] = kstar if Rbar <= 2/3: df['vm_plus'] = df['mu_vm'] + np.arccos(np.sqrt(2*N*(2*(R**2) - N*chi2.isf(0.9,1))/((R**2)*(4*N - chi2.isf(0.9,1))))) df['vm_minus'] = df['mu_vm'] - np.arccos(np.sqrt(2*N*(2*(R**2) - N*chi2.isf(0.9,1))/((R**2)*(4*N - chi2.isf(0.9,1))))) else: df['vm_plus'] = df['mu_vm'] + np.arccos(np.sqrt((N**2) - ((N**2) - (R**2))*np.exp(chi2.isf(0.9,1)/N))/R) df['vm_minus'] = df['mu_vm'] - np.arccos(np.sqrt((N**2) - ((N**2) - (R**2))*np.exp(chi2.isf(0.9,1)/N))/R) df['vm_conft'] = np.where((df['vm_plus'] < df[field]) | (df['vm_minus'] > df[field]), True, False) return df df = pd.concat([pd.DataFrame({'A':[1,1,1,1,1,2,2,2,2,2]}), pd.DataFrame({'B':random_dates(pd.to_datetime('2015-01-01'), pd.to_datetime('2018-01-01'), 10)})],axis=1) df['C'] = (df['B'].dt.hour*60+df['B'].dt.minute)*60 + df['B'].dt.second df['D'] = df['C']*2*np.pi/(24*60*60) df = df.groupby('A').apply(lambda x : vonmises(x, 'D'))
例如,回到小时数,只需乘以24并除以2pi,就可以>]