如何从时间戳中提取毫秒数据并查找其数据的平均值

问题描述 投票:-1回答:1

我在解决时间序列数据时处于棘手的情况。让我尝试使用下表来解释

       T1                   D1      T2(roundup ms)     D2(avg of D1)   T3(round up ms)     D3     T4(roundup to second)  D4(avg second)

2020.05.22 11.30.1.200      10
2020.05.22 11.30.1.220      20
2020.05.22 11.30.1.240      30
2020.05.22 11.30.1.260      40
2020.05.22 11.30.1.280      50      2020.05.22 11.30.1.200    30
2020.05.22 11.30.1.300      60
2020.05.22 11.30.1.310      70
2020.05.22 11.30.1.350      80      2020.05.22 11.30.1.300    70
2020.05.22 11.30.1.400      90
2020.05.22 11.30.1.450      11
2020.05.22 11.30.1.470      31      2020.05.22 11.30.1.400    44     2020.05.22 11.30.1    48     
................
2020.05.22 11.30.7.100      22
2020.05.22 11.30.7.120      33
2020.05.22 11.30.7.140      44
2020.05.22 11.30.7.160      55
2020.05.22 11.30.7.180      66      2020.05.22 11.30.7.100    44
2020.05.22 11.30.7.200      77
2020.05.22 11.30.7.210      88
2020.05.22 11.30.7.250      99      2020.05.22 11.30.7.200    88
2020.05.22 11.30.7.300      31
2020.05.22 11.30.7.350      32
2020.05.22 11.30.7.370      33      2020.05.22 11.30.7.300    32     2020.05.22 11.30.7    54.66   2020.05.22 11.30
................

我从csv文件导入时间序列数据。在第一层上,我想找到毫秒部分(0-100、100-200等之间)的数据平均值,并取整毫秒值。在第二个级别上,我想进一步将值平均到毫秒(平均100,200,300等)。我想保持平均到秒(1秒,2秒,3秒等),并将时间戳取整到秒位置。不知道如何解释,但是表格试图描述这种情况。随意对我投反对票,但我对如何实现这种平均水平感到迷茫。

我想逻辑是首先选择毫秒部分,例如0-100毫秒之间,将这两个点之间的所有值取平均值,然后将毫秒值四舍五入为1。每隔一毫秒执行一次操作,然后继续前进第二和第三级我搞不清楚了。只是一个方向也将有助于选择毫秒之间的值。

python pandas time-series
1个回答
0
投票

请参阅下面的问题解决方案。我使用日期作为字符串解决了这个问题,但是我认为如果只使用datetime格式就可以解决(我会考虑的)。我获取了上面示例中可以看到的部分数据,并能够建立您的预期结果。如有任何疑问,请随时与我们联系。希望对您有所帮助。

df = pd.read_csv(os.path.join(data_folder, file_name)) # here I am just reading your file, portion you provided in the question
df["T1"] = df["T1"].astype(str) # will work on the time, but in the form of a string
df.sort_values("T1", inplace=True)
df["hour"] = df["T1"].map(lambda s: ".".join(s.split(".")[:3])) # separating hour portion
df["minute"] = df["T1"].map(lambda s: ".".join(s.split(".")[:4])) # separating minute portion
df["second"] = df["T1"].map(lambda s: ".".join(s.split(".")[:5])) # separating second portion
df["mili"] = df["T1"].map(lambda s: s.split(".")[-1]).astype(int) # separating milisecond portion
df["mili"] = (df["mili"].values // 100) * 100 # rounding down the milisecond
df["milisecond"] = df["second"] + "." + df["mili"].astype(str)
df["miliRowID"] = df.groupby("milisecond").cumcount() + 1 # creating a row count for miliseconds, need this to join data back together
df["secRowID"] = df.groupby("second").cumcount() + 1  # creating a row count for seconds
df["minRowID"] = df.groupby("minute").cumcount() + 1  # creating a row count for minutes
df.drop(["mili"], axis=1, inplace=True)

xMili = df.groupby("milisecond")["D1"].mean().reset_index() # calculating milisecond averages
xMili.columns = ["T2(roundup ms)", "D2(avg of D1)"]

xSec = df.groupby("second")["D1"].mean().reset_index() # calculating second averages
xSec.columns = ["T3(round up ms)", "D3"]

xMin = df.groupby("minute")["D1"].mean().reset_index() # calculating minute averages
xMin.columns = ["T4(roundup to second)", "D4(avg second)"]

# in this section we are joining calculated averages back with the main dataframe
dfTemp = df.groupby("milisecond")["miliRowID"].max().reset_index().merge(xMili, 
                                                                     how="left", 
                                                                     left_on="milisecond", 
                                                                     right_on="T2(roundup ms)")
df = df.merge(dfTemp, how="left", on=["milisecond", "miliRowID"])

dfTemp = df.groupby("second")["secRowID"].max().reset_index().merge(xSec, 
                                                                how="left", 
                                                                left_on="second", 
                                                                right_on="T3(round up ms)")
df = df.merge(dfTemp, how="left", on=["second", "secRowID"])

dfTemp = df.groupby("minute")["minRowID"].max().reset_index().merge(xMin, 
                                                                how="left", 
                                                                left_on="minute", 
                                                                right_on="T4(roundup to second)")
df = df.merge(dfTemp, how="left", on=["minute", "minRowID"])
# and finally dropping unnecessary columns
DROP = ["hour", "minute", "second", "milisecond", "miliRowID", "secRowID", "minRowID"]
df.drop(DROP, axis=1, inplace=True)
© www.soinside.com 2019 - 2024. All rights reserved.