在使用dateutil.parser.parse(x)转换为关闭时间的ISO时间后,ValueError:数组必须全部为相同时区

问题描述 投票:0回答:1

我正在尝试在发送给postgres之前格式化数据帧中的时间戳。在原本不错的数据集中,我有少量无意义的时间戳。

我的时间戳列中的示例无意义的数据点:2019-11-11T07:08:09.640-31:00

这是没有意义的,因为时区偏移只能在-12和+14之间。

我尝试按原样发​​送给postgres:

# post to pageviews table
pages.to_sql('pageviews',
             con = engine,
             schema = 'ga__marketing',
             index = False,
             if_exists = 'append')

sqlalchemy.exc.DataError:(psycopg2.errors.InvalidTimeZoneDisplacementValue)时区位移超出范围:“ 2019-11-11T07:08:09.640-31:00” LINE 1:...页面,采样)VALUES(' 1573567730187.hf7k8jc7','2019-11-1 ...

所以,我尝试在发送到postgres之前更改熊猫的日期格式:

import dateutil
pages['my_iso_timestamp_with_offset'] = pages['my_iso_timestamp_with_offset'].apply(lambda x: dateutil.parser.parse(x))

这可以正常运行,并返回一个声音数据帧。但是,当我尝试发送到postgres时,我得到:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1979, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 185, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<input>", line 19, in <module>
  File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 2712, in to_sql
    method=method,
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 518, in to_sql
    method=method,
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 1317, in to_sql
    dtype=dtype,
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 619, in __init__
    self.table = self._create_table_setup()
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 868, in _create_table_setup
    column_names_and_types = self._get_column_names_and_types(self._sqlalchemy_type)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 860, in _get_column_names_and_types
    for i in range(len(self.frame.columns))
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 860, in <listcomp>
    for i in range(len(self.frame.columns))
  File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 970, in _sqlalchemy_type
    if col.dt.tz is not None:
  File "/usr/local/lib/python3.7/site-packages/pandas/core/accessor.py", line 79, in _getter
    return self._delegate_property_get(name)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 65, in _delegate_property_get
    values = self._get_values()
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 55, in _get_values
    return DatetimeIndex(data, copy=False, name=self.name)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 334, in __new__
    int_as_wall_time=True,
  File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 446, in _from_sequence
    int_as_wall_time=int_as_wall_time,
  File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1866, in sequence_to_dt64ns
    data, dayfirst=dayfirst, yearfirst=yearfirst
  File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1984, in objects_to_datetime64ns
    raise e
  File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1975, in objects_to_datetime64ns
    require_iso8601=require_iso8601,
  File "pandas/_libs/tslib.pyx", line 465, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 543, in pandas._libs.tslib.array_to_datetime
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

如何克服这个问题并将数据集放入Postgres?

python pandas postgresql python-dateutil
1个回答
0
投票

您可以尝试以下方法:

pages.set_index('my_iso_timestamp_with_offset').tz_convert('utc').reset_index()

,看看是否可以正确转换它们。您可以在转换中使用任何时区,例如“美国/科罗拉多州”。

© www.soinside.com 2019 - 2024. All rights reserved.