我正在使用
read_sql_query
函数将一些 SQL Server 数据读入 Pandas 中。问题是我在 DATETIME2
列上失去了一位精度。
编码示例:
query = pd.read_sql_query(
'''
SELECT CAST('2021-05-06 15:44:29.1234567' AS DATETIME2(7)) AS ServiceDate
''', source_engine)
df = pd.DataFrame(query)
df
结果:
Out[71]:
0 2021-05-06 15:44:29.123456
这确实会导致数据比较出现问题,所以我需要确保精度相同。
我怎样才能阻止这种情况发生?
我刚刚遇到这个问题。使用 pyodbc 和 pymssql,日期时间值被截断为微秒(Python
datetime.datetime
的最大精度),尽管 DataFrame 列是 NumPy datetime64[ns]。
import pandas as pd
import sqlalchemy as sa
engine = sa.create_engine("mssql+pyodbc://scott:tiger^5HHH@mssql_199")
with engine.begin() as conn:
conn.exec_driver_sql("drop table if exists zzz")
conn.exec_driver_sql("create table zzz (dt2 datetime2)")
conn.exec_driver_sql(
"insert into zzz (dt2) values ('2001-01-01 00:00:00.1234567')"
)
df = pd.read_sql_query("select dt2 from zzz", engine)
print(df.info())
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 dt2 1 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 140.0 bytes
None
"""
print(df)
"""
dt2
0 2001-01-01 00:00:00.123456
"""
解决方法是将列检索为 varchar 并使用
dtype=
转换类型
df = pd.read_sql_query(
"select cast(dt2 as varchar(30)) as dt2 from zzz",
engine,
dtype=dict(dt2="datetime64[ns]"),
)
print(df.info())
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 dt2 1 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 140.0 bytes
None
"""
print(df)
"""
dt2
0 2001-01-01 00:00:00.123456700
"""