我在解决时间序列数据时处于棘手的情况。让我尝试使用下表来解释
T1 D1 T2(roundup ms) D2(avg of D1) T3(round up ms) D3 T4(roundup to second) D4(avg second)
2020.05.22 11.30.1.200 10
2020.05.22 11.30.1.220 20
2020.05.22 11.30.1.240 30
2020.05.22 11.30.1.260 40
2020.05.22 11.30.1.280 50 2020.05.22 11.30.1.200 30
2020.05.22 11.30.1.300 60
2020.05.22 11.30.1.310 70
2020.05.22 11.30.1.350 80 2020.05.22 11.30.1.300 70
2020.05.22 11.30.1.400 90
2020.05.22 11.30.1.450 11
2020.05.22 11.30.1.470 31 2020.05.22 11.30.1.400 44 2020.05.22 11.30.1 48
................
2020.05.22 11.30.7.100 22
2020.05.22 11.30.7.120 33
2020.05.22 11.30.7.140 44
2020.05.22 11.30.7.160 55
2020.05.22 11.30.7.180 66 2020.05.22 11.30.7.100 44
2020.05.22 11.30.7.200 77
2020.05.22 11.30.7.210 88
2020.05.22 11.30.7.250 99 2020.05.22 11.30.7.200 88
2020.05.22 11.30.7.300 31
2020.05.22 11.30.7.350 32
2020.05.22 11.30.7.370 33 2020.05.22 11.30.7.300 32 2020.05.22 11.30.7 54.66 2020.05.22 11.30
................
我从csv文件导入时间序列数据。在第一层上,我想找到毫秒部分(0-100、100-200等之间)的数据平均值,并取整毫秒值。在第二个级别上,我想进一步将值平均到毫秒(平均100,200,300等)。我想保持平均到秒(1秒,2秒,3秒等),并将时间戳取整到秒位置。不知道如何解释,但是表格试图描述这种情况。随意对我投反对票,但我对如何实现这种平均水平感到迷茫。
我想逻辑是首先选择毫秒部分,例如0-100毫秒之间,将这两个点之间的所有值取平均值,然后将毫秒值四舍五入为1。每隔一毫秒执行一次操作,然后继续前进第二和第三级我搞不清楚了。只是一个方向也将有助于选择毫秒之间的值。
请参阅下面的问题解决方案。我使用日期作为字符串解决了这个问题,但是我认为如果只使用datetime格式就可以解决(我会考虑的)。我获取了上面示例中可以看到的部分数据,并能够建立您的预期结果。如有任何疑问,请随时与我们联系。希望对您有所帮助。
df = pd.read_csv(os.path.join(data_folder, file_name)) # here I am just reading your file, portion you provided in the question
df["T1"] = df["T1"].astype(str) # will work on the time, but in the form of a string
df.sort_values("T1", inplace=True)
df["hour"] = df["T1"].map(lambda s: ".".join(s.split(".")[:3])) # separating hour portion
df["minute"] = df["T1"].map(lambda s: ".".join(s.split(".")[:4])) # separating minute portion
df["second"] = df["T1"].map(lambda s: ".".join(s.split(".")[:5])) # separating second portion
df["mili"] = df["T1"].map(lambda s: s.split(".")[-1]).astype(int) # separating milisecond portion
df["mili"] = (df["mili"].values // 100) * 100 # rounding down the milisecond
df["milisecond"] = df["second"] + "." + df["mili"].astype(str)
df["miliRowID"] = df.groupby("milisecond").cumcount() + 1 # creating a row count for miliseconds, need this to join data back together
df["secRowID"] = df.groupby("second").cumcount() + 1 # creating a row count for seconds
df["minRowID"] = df.groupby("minute").cumcount() + 1 # creating a row count for minutes
df.drop(["mili"], axis=1, inplace=True)
xMili = df.groupby("milisecond")["D1"].mean().reset_index() # calculating milisecond averages
xMili.columns = ["T2(roundup ms)", "D2(avg of D1)"]
xSec = df.groupby("second")["D1"].mean().reset_index() # calculating second averages
xSec.columns = ["T3(round up ms)", "D3"]
xMin = df.groupby("minute")["D1"].mean().reset_index() # calculating minute averages
xMin.columns = ["T4(roundup to second)", "D4(avg second)"]
# in this section we are joining calculated averages back with the main dataframe
dfTemp = df.groupby("milisecond")["miliRowID"].max().reset_index().merge(xMili,
how="left",
left_on="milisecond",
right_on="T2(roundup ms)")
df = df.merge(dfTemp, how="left", on=["milisecond", "miliRowID"])
dfTemp = df.groupby("second")["secRowID"].max().reset_index().merge(xSec,
how="left",
left_on="second",
right_on="T3(round up ms)")
df = df.merge(dfTemp, how="left", on=["second", "secRowID"])
dfTemp = df.groupby("minute")["minRowID"].max().reset_index().merge(xMin,
how="left",
left_on="minute",
right_on="T4(roundup to second)")
df = df.merge(dfTemp, how="left", on=["minute", "minRowID"])
# and finally dropping unnecessary columns
DROP = ["hour", "minute", "second", "milisecond", "miliRowID", "secRowID", "minRowID"]
df.drop(DROP, axis=1, inplace=True)