我有一个 json 时间戳列表:
[
"2024-03-27 00:30:30.321000",
"2024-03-27 00:34:58.695000",
"2024-03-27 00:37:38.352000",
"2024-03-27 00:37:40.419000",
"2024-03-27 00:43:54.536000",
"2024-03-27 00:49:39.231000",
"2024-03-27 01:03:39.637000",
"2024-03-27 01:05:24.370000",
"2024-03-27 01:17:43.586000",
"2024-03-27 01:17:47.447000",
"2024-03-27 01:17:59.913000",
"2024-03-27 01:18:34.872000",
"2024-03-27 01:18:36.922000",
"2024-03-27 01:18:44.626000",
"2024-03-27 01:19:11.057000",
"2024-03-27 01:19:12.307000",
"2024-03-27 01:21:11.322000",
"2024-03-27 01:26:54.640000",
"2024-03-27 01:26:55.055000",
...
我希望绘制它们的频率,例如每小时。我可以让它与 pandas 一起使用,但这需要我添加一个虚拟列:
[
{
"foo": 1,
"ts": "2024-03-27 00:24:13.132000"
},
{
"foo": 1,
"ts": "2024-03-27 00:30:30.321000"
},
{
"foo": 1,
"ts": "2024-03-27 00:34:58.695000"
},
{
"foo": 1,
"ts": "2024-03-27 00:36:04.166000"
},
{
"foo": 1,
"ts": "2024-03-27 00:37:38.352000"
},
{
"foo": 1,
"ts": "2024-03-27 00:37:40.419000"
},
{
"foo": 1,
"ts": "2024-03-27 00:43:54.536000"
},
....
]
所以我可以使用
sum()
:
import sys
import pandas as pd
freq = '1d'
df = pd.read_json(sys.stdin)
df['ts'] = pd.to_datetime(df['ts'])
overview = df.resample(freq, on='ts').foo.sum()
print(overview)
这给出了我正在寻找的东西:
2024-03-27 674
2024-03-28 405
2024-03-29 366
2024-03-30 352
2024-03-31 541
2024-04-01 657
2024-04-02 398
2024-04-03 523
2024-04-04 466
2024-04-05 498
2024-04-06 468
2024-04-07 312
2024-04-08 453
2024-04-09 625
2024-04-10 654
2024-04-11 696
2024-04-12 624
2024-04-13 377
2024-04-14 304
2024-04-15 493
2024-04-16 544
2024-04-17 526
我可以在没有虚拟列的情况下执行此操作吗?那么只需使用简单的时间戳列表作为输入?
使用您可以使用的示例数据:
data = [
"2024-03-27 00:30:30.321000",
"2024-03-27 00:34:58.695000",
"2024-03-27 00:37:38.352000",
"2024-03-27 00:37:40.419000",
"2024-03-27 00:43:54.536000",
"2024-03-27 00:49:39.231000",
"2024-03-27 01:03:39.637000",
"2024-03-27 01:05:24.370000",
"2024-03-27 01:17:43.586000",
"2024-03-27 01:17:47.447000",
"2024-03-27 01:17:59.913000",
"2024-03-27 01:18:34.872000",
"2024-03-27 01:18:36.922000",
"2024-03-27 01:18:44.626000",
"2024-03-27 01:19:11.057000",
"2024-03-27 01:19:12.307000",
"2024-03-27 01:21:11.322000",
"2024-03-27 01:26:54.640000",
"2024-03-27 01:26:55.055000"]
df = pd.Series(data=pd.to_datetime(data))
freq = df.groupby([df.dt.floor('1h')]).count()
print(freq)
代码产生:
2024-03-27 00:00:00 6
2024-03-27 01:00:00 13