pyodbc 错误:'HY090',无效的字符串或缓冲区长度 (0)

问题描述 投票:0回答:2

目的:

利用

read_sql_query()
to_sql()
pandas
方法,我的
python 3.7
脚本的目标是通过读取
.sql
文件来执行从一台服务器到另一台服务器的多个表的 ETL。两种方法中的连接参数都利用了
create_engine
sqlalchemy
模块。

出现错误:

成功提取并加载第一组表+事务后,出现第四个错误。

 sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('HY090', '[HY090] [Microsoft][ODBC Driver Manager] Invalid string or buffer length (0) (SQLExecDirectW)')

请参阅下文了解更多详情。

程序:

  • 每个提取的表都被写入一个 SQL 文件中,并按
    ;
    拆分。

    'ExtractTables.sql'

    SET NOCOUNT ON
    SELECT 
        [ID1]       
    ,   [Name]          
    ,   [LastUpdated]   
    ,   [UpdatedBy]     

    INTO #table1
    FROM DB1.dbo.table1 

    SELECT * FROM #table1

    ;

    SET NOCOUNT ON
    SELECT 
        [ID1]
    ,   [ID2]
    ,   [Descr]

    INTO #table2 FROM DB1.dbo.table2

    SELECT * FROM #table2

  • ODBC 参数是通过每个服务器的
    create_engine
    模块为引擎设置的。两台服务器都是 MS-SQL 服务器。根据我的研究,我相信我的错误来自
    fast_executemany
    参数。
    'connection.py'

    import pyodbc
    import urllib
    from sqlalchemy import create_engine

    #Use trusted connection to connect to server. fast_executemany is mssql specific. Allows for large data loads.

    params_H = urllib.parse.quote_plus("DRIVER=ODBC Driver 17 for SQL Server;SERVER=SERVER1;DATABASE=DB1;Trusted_Connection=yes")
    engine_H = create_engine(f'mssql+pyodbc:///?odbc_connect={params_H}', fast_executemany=True)


    params_b = urllib.parse.quote_plus("DRIVER=ODBC Driver 17 for SQL Server;SERVER=SERVER2;DATABASE=DB2;Trusted_Connection=yes")
    engine_b= create_engine(f'mssql+pyodbc:///?odbc_connect={params_b}', fast_executemany=True)

  • Python 脚本,用于迭代每个命令并使用
    read_sql_query()
    提取每个命令,并使用
    to_sql()
    加载每个表。

    'LoadTables.py'

    import pandas as pd
    import conn


    def readSQLFile_makeTables(filename):
        # Open and read file
        open_file = open(filename, 'r')
        sql_file = open_file.read()
        open_file.close()

        #all SQL commands (split on ';')
        sql_commands = sql_file.split(';')


        # Execute every command from file
        sql_tables = ['stg_table1', 'stg_table2', 'stg_table3']

        i = 0
        for command in sql_commands:

            table = pd.read_sql_query(command, con=conn.engine_H)
            print(table)
            table.to_sql(sql_tables[i], con=conn.engine_b, chunksize=5000, index=False, if_exists='append')
            i += 1

        print('think this ran')


错误:

出于 StackOverflow 的目的,我将 sql 代码限制为 2 批,但实际上有 4 批。前 3 批成功通过了读取和写入。然而,第四个会引发写入错误。错误表批次与其他批次的主要区别在于其大小(700 万行 x 8 列),其次是大小(150 万行 x 6 列)。

故障排除:

我在该主题上研究的所有错误都表明这是 ODBC 连接的问题。两台服务器都是 64 位,我使用的是 pydobc 4.025,并且我只测试了提取整数值的字段。第一个能够成功加载的事务告诉我大多数事务正在工作,但最后一个事务存在阻止加载的问题。我假设大小,我相信这是由

chunksize=5000
fast_executemany = true
处理的,并且错误指向绑定参数。

https://github.com/mkleehammer/pyodbc/issues/548

追溯:


    [6721864 rows x 8 columns]
    Traceback (most recent call last):
      File "C:\Users\x\AppData\Local\Continuum\anaconda3\envs\envname\lib\site-packages\sqlalchemy\engine\base.py", line 1244, in _execute_context
        cursor, statement, parameters, context
      File "C:\Users\x\AppData\Local\Continuum\anaconda3\envs\envname\lib\site-packages\sqlalchemy\engine\default.py", line 552, in do_execute
        cursor.execute(statement, parameters)
    pyodbc.Error: ('HY090', '[HY090] [Microsoft][ODBC Driver Manager] Invalid string or buffer length (0) (SQLExecDirectW)')

    The above exception was the direct cause of the following exception:
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "PYTHON\Testing\ETL\LoadTables.py", line 21, in readSQLFile_makeTables
        table = pd.read_sql_query(command, con=conn.engine_H)
      File "\lib\site-packages\pandas\io\sql.py", line 314, in read_sql_query
        parse_dates=parse_dates, chunksize=chunksize)
      File "\lib\site-packages\pandas\io\sql.py", line 1063, in read_query
        result = self.execute(*args)
      File "\lib\site-packages\pandas\io\sql.py", line 954, in execute
        return self.connectable.execute(*args, **kwargs)
      File "\lib\site-packages\sqlalchemy\engine\base.py", line 2166, in execute
        return connection.execute(statement, *multiparams, **params)
      File "\lib\site-packages\sqlalchemy\engine\base.py", line 982, in execute
        return self._execute_text(object_, multiparams, params)
      File "\lib\site-packages\sqlalchemy\engine\base.py", line 1155, in _execute_text
        parameters,
      File "\lib\site-packages\sqlalchemy\engine\base.py", line 1248, in _execute_context
        e, statement, parameters, cursor, context
      File "\lib\site-packages\sqlalchemy\engine\base.py", line 1466, in _handle_dbapi_exception
        util.raise_from_cause(sqlalchemy_exception, exc_info)
      File "\lib\site-packages\sqlalchemy\util\compat.py", line 383, in raise_from_cause
        reraise(type(exception), exception, tb=exc_tb, cause=cause)
      File "\lib\site-packages\sqlalchemy\util\compat.py", line 128, in reraise
        raise value.with_traceback(tb)
      File "\lib\site-packages\sqlalchemy\engine\base.py", line 1244, in _execute_context
        cursor, statement, parameters, context
      File "\lib\site-packages\sqlalchemy\engine\default.py", line 552, in do_execute
        cursor.execute(statement, parameters)
    sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('HY090', '[HY090] [Microsoft][ODBC Driver Manager] Invalid string or buffer length (0) (SQLExecDirectW)')
    (Background on this error at: http://sqlalche.me/e/dbapi)

sql sql-server python-3.x pandas sqlalchemy
2个回答
0
投票

我也遇到过类似的问题。发生的情况是,当将数据传输到 pyodbc 时,某些数据的信息格式不正确或 pyodb 不支持。解决方案是将数据传输到 CSV 并让 pyodbc 从 CSV 获取数据


0
投票

尝试使用

cursor.setinputsizes
设置列的数据类型。并将 VARCHAR 类型设置为 INT、FLOAT 或 DateTime 列。

cursor.setinputsizes([(pyodbc.SQL_VARCHAR, 50, 0), ...])

这个错误是由

fast_executemany = True
造成的。这与记忆的一些魔法有关。这些文章将有所帮助 https://github.com/mkleehammer/pyodbc/issues/520https://github.com/mkleehammer/pyodbc/pull/729

© www.soinside.com 2019 - 2024. All rights reserved.