Pandas：解析 UNIX 时间戳时出现 ValueError

Question

我有一个 OHLC 数据的 CSV，自纪元以来按秒索引：

我是这样解析的：

df = pd.read_csv(f"{CSV}/{filename}", sep=",", header=0, index_col=0, parse_dates=['time'], date_format='s')

但是时间戳没有被解析为日期：

df.index

Index([ 378943200,  379548000,  380152800,  380757600,  381362400,  381967200,
        382572000,  383176800,  383781600,  384386400,
       ...
       1687726800, 1688331600, 1688936400, 1689541200, 1690146000, 1690750800,
       1691355600, 1691960400, 1692565200, 1693170000],
      dtype='int64', name='time', length=2172)

此外，如果我手动尝试将索引转换为日期时间，我会收到 ValueError：

pd.to_datetime(df.index, format='s', utc=True)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[71], line 1
----> 1 pd.to_datetime(df.index, format='s', utc=True)

...

ValueError: time data "378943200" doesn't match format "s", at position 0. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

这是一个根据 www.unixtimestamp.com 的 UNIX 时间戳值，那么给出了什么？

Answer 1

您需要使用

unit

指定秒数，而不是

format

pd.to_datetime(df.index, unit='s', utc=True)

格式用于解析strftime

测试员：

import pandas as pd

df = pd.DataFrame({"time": [ 378943200,  379548000,  380152800,  380757600,  381362400,  381967200,
        382572000,  383176800,  383781600,  384386400,]})

df["time_dt"] = pd.to_datetime(df["time"], unit="s", utc=True)
print(df.head(3))

# OUT
        time                   time_dt
0  378943200 1982-01-03 22:00:00+00:00
1  379548000 1982-01-10 22:00:00+00:00
2  380152800 1982-01-17 22:00:00+00:00

带有 read_csv 的单行

自从

date_parser

在 pandas 2.0 中被弃用（由于性能？），他们建议使用

parse_dates

和

date_format

代替。据我了解，因为

date_format

使用

strftime

解析并且没有unix时间的标志，所以没有“智能”的方法来做到这一点，所以在

to_datetime

之后使用

read_csv

是可能是要走的路（从 pandas 2.1.0 开始）。

话虽这么说，你仍然可以为列传递自定义的

converters

，但我不知道它与

to_datetime

-method

之后的

read_csv

相比有什么样的性能

import io
import datetime
import pandas as pd

csv_file = io.StringIO("time\n378943200\n379548000\n380152800\n")

df = pd.read_csv(csv_file, converters={"time": lambda s: datetime.datetime.utcfromtimestamp(int(s))})
print(df.time)

Pandas：解析 UNIX 时间戳时出现 ValueError

问题描述投票：0回答：1

1个回答

最新问题

Pandas：解析 UNIX 时间戳时出现 ValueError

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1