我正在尝试在发送给postgres之前格式化数据帧中的时间戳。在原本不错的数据集中,我有少量无意义的时间戳。
我的时间戳列中的示例无意义的数据点:2019-11-11T07:08:09.640-31:00
这是没有意义的,因为时区偏移只能在-12和+14之间。
我尝试按原样发送给postgres:
# post to pageviews table
pages.to_sql('pageviews',
con = engine,
schema = 'ga__marketing',
index = False,
if_exists = 'append')
sqlalchemy.exc.DataError:(psycopg2.errors.InvalidTimeZoneDisplacementValue)时区位移超出范围:“ 2019-11-11T07:08:09.640-31:00” LINE 1:...页面,采样)VALUES(' 1573567730187.hf7k8jc7','2019-11-1 ...
所以,我尝试在发送到postgres之前更改熊猫的日期格式:
import dateutil
pages['my_iso_timestamp_with_offset'] = pages['my_iso_timestamp_with_offset'].apply(lambda x: dateutil.parser.parse(x))
这可以正常运行,并返回一个声音数据帧。但是,当我尝试发送到postgres时,我得到:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1979, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data)
File "pandas/_libs/tslibs/conversion.pyx", line 185, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<input>", line 19, in <module>
File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 2712, in to_sql
method=method,
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 518, in to_sql
method=method,
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 1317, in to_sql
dtype=dtype,
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 619, in __init__
self.table = self._create_table_setup()
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 868, in _create_table_setup
column_names_and_types = self._get_column_names_and_types(self._sqlalchemy_type)
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 860, in _get_column_names_and_types
for i in range(len(self.frame.columns))
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 860, in <listcomp>
for i in range(len(self.frame.columns))
File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 970, in _sqlalchemy_type
if col.dt.tz is not None:
File "/usr/local/lib/python3.7/site-packages/pandas/core/accessor.py", line 79, in _getter
return self._delegate_property_get(name)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 65, in _delegate_property_get
values = self._get_values()
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 55, in _get_values
return DatetimeIndex(data, copy=False, name=self.name)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 334, in __new__
int_as_wall_time=True,
File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 446, in _from_sequence
int_as_wall_time=int_as_wall_time,
File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1866, in sequence_to_dt64ns
data, dayfirst=dayfirst, yearfirst=yearfirst
File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1984, in objects_to_datetime64ns
raise e
File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1975, in objects_to_datetime64ns
require_iso8601=require_iso8601,
File "pandas/_libs/tslib.pyx", line 465, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 543, in pandas._libs.tslib.array_to_datetime
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
如何克服这个问题并将数据集放入Postgres?
您可以尝试以下方法:
pages.set_index('my_iso_timestamp_with_offset').tz_convert('utc').reset_index()
,看看是否可以正确转换它们。您可以在转换中使用任何时区,例如“美国/科罗拉多州”。