在 Python 中使用 pyparsing 库在列上使用聚合函数时如何构造列名?

问题描述 投票:0回答:0

我正在尝试使用 Python 的 pyparsing 库解析 SQL 查询。 作为其中的一部分,我实现了以下代码:

from pyparsing import *

ParserElement.enablePackrat()

select_stmt = Forward().setName("select statement")

# Define keywords and symbols
SELECT = CaselessKeyword("SELECT")
FROM = CaselessKeyword("FROM")
JOIN = CaselessKeyword("JOIN")
ON = CaselessKeyword("ON")
WHERE = CaselessKeyword("WHERE")
GROUP_BY = CaselessKeyword("GROUP BY")
HAVING = CaselessKeyword("HAVING")
ORDER_BY = CaselessKeyword("ORDER BY")
AS = CaselessKeyword("AS")

COMMA = Suppress(",")
LPAREN = Suppress("(")
RPAREN = Suppress(")")
DOT, STAR = map(Literal, ".*")

# Define SQL language elements
ParserElement.setDefaultWhitespaceChars(" \t")
identifier = Regex(r"[_a-zA-Z][_a-zA-Z0-9]*")
columnName = identifier + ZeroOrMore("." + identifier)
tableName = identifier + ZeroOrMore("." + identifier)
columnNameList = delimitedList(columnName)
tableNameList = delimitedList(tableName)
alias = identifier
aliasExpression = AS + alias
column = columnName + Optional(aliasExpression)

selectStatement = SELECT + columnNameList + FROM + tableNameList + \
                  ZeroOrMore(JOIN + tableName + ON + columnName + "=" + columnName) + \
                  Optional(WHERE + columnName + Regex(r"[><=]") + columnName) + \
                  Optional(GROUP_BY + columnNameList + Optional(HAVING + columnName + Regex(r"[><=]") + Regex(r"[0-9]+"))) + \
                  Optional(ORDER_BY + columnNameList + Optional(CaselessKeyword("ASC") | CaselessKeyword("DESC")))


# Define a function to extract column names and table names from a SQL query
def extract_columns_and_tables(query):
    column_aliases = {}
    table_aliases = {}
    try:
        result = selectStatement.parseString(query)  # fixed variable name
        # Extract column names and aliases
        for col in result[1:]:
            if AS in col:
                col_name, col_alias = col.split(AS)
                column_aliases[col_name.strip()] = col_alias.strip()
            else:
                column_aliases[col.strip()] = col.strip()
        
        # Extract table names and aliases
        for tbl in result[4:]:
            if AS in tbl:
                tbl_name, tbl_alias = tbl.split(AS)
                table_aliases[tbl_name.strip()] = tbl_alias.strip()
            else:
                table_aliases[tbl.strip()] = tbl.strip()
    
    except ParseException as e:
        print("Error: ", e)
    
    return column_aliases, table_aliases


if __name__ == '__main__':
    # Test the function with the given SQL query
    query = "SELECT employees.employee_id, employees.first_name, employees.last_name, departments.department_name, AVG(salaries.salary) AS average_salary FROM employees JOIN salaries ON employees.employee_id = salaries.employee_id JOIN dept_emp ON employees.employee_id = dept_emp.employee_id JOIN departments ON dept_emp.dept_no = departments.dept_no WHERE salaries.to_date > NOW() AND dept_emp.to_date > NOW() GROUP BY employees.employee_id, departments.department_name HAVING AVG(salaries.salary) > 50000 ORDER BY average_salary DESC"
    column_aliases, table_aliases = extract_columns_and_tables(query)

    # Print the results
    print("Column aliases: ", column_aliases)
    print("Table aliases: ", table_aliases)

我在这里要做的就是解析查询并获取表名和列名。 但是当我运行查询时,我收到一条错误消息:

Error:  Expected CaselessKeyword 'FROM', found '('  (at char 105), (line:1, col:106)

我的查询不包含任何语法问题。同样在我创建的 selectstatement 中,

SELECT
+
COLUMNS
+
FROM
所以我不明白我在哪里犯了错误。

我正在尝试解析的查询:

SELECT employees.employee_id, employees.first_name, employees.last_name, departments.department_name, AVG(salaries.salary) AS average_salary FROM employees JOIN salaries ON employees.employee_id = salaries.employee_id JOIN dept_emp ON employees.employee_id = dept_emp.employee_id JOIN departments ON dept_emp.dept_no = departments.dept_no WHERE salaries.to_date > NOW() AND dept_emp.to_date > NOW() GROUP BY employees.employee_id, departments.department_name HAVING AVG(salaries.salary) > 50000 ORDER BY average_salary DESC

编辑1: 错误恰好发生在聚合函数:select子句中的AVG遇到左括号时:

SELECT employees.employee_id, employees.first_name, employees.last_name, departments.department_name, **AVG(**salaries.salary) AS average_salary FROM 

谁能告诉我我在这里犯了什么错误,我该如何纠正? 非常感谢任何帮助。

python pyparsing
© www.soinside.com 2019 - 2024. All rights reserved.