我正在尝试使用 Python 的 pyparsing 库解析 SQL 查询。 作为其中的一部分,我实现了以下代码:
from pyparsing import *
ParserElement.enablePackrat()
select_stmt = Forward().setName("select statement")
# Define keywords and symbols
SELECT = CaselessKeyword("SELECT")
FROM = CaselessKeyword("FROM")
JOIN = CaselessKeyword("JOIN")
ON = CaselessKeyword("ON")
WHERE = CaselessKeyword("WHERE")
GROUP_BY = CaselessKeyword("GROUP BY")
HAVING = CaselessKeyword("HAVING")
ORDER_BY = CaselessKeyword("ORDER BY")
AS = CaselessKeyword("AS")
COMMA = Suppress(",")
LPAREN = Suppress("(")
RPAREN = Suppress(")")
DOT, STAR = map(Literal, ".*")
# Define SQL language elements
ParserElement.setDefaultWhitespaceChars(" \t")
identifier = Regex(r"[_a-zA-Z][_a-zA-Z0-9]*")
columnName = identifier + ZeroOrMore("." + identifier)
tableName = identifier + ZeroOrMore("." + identifier)
columnNameList = delimitedList(columnName)
tableNameList = delimitedList(tableName)
alias = identifier
aliasExpression = AS + alias
column = columnName + Optional(aliasExpression)
selectStatement = SELECT + columnNameList + FROM + tableNameList + \
ZeroOrMore(JOIN + tableName + ON + columnName + "=" + columnName) + \
Optional(WHERE + columnName + Regex(r"[><=]") + columnName) + \
Optional(GROUP_BY + columnNameList + Optional(HAVING + columnName + Regex(r"[><=]") + Regex(r"[0-9]+"))) + \
Optional(ORDER_BY + columnNameList + Optional(CaselessKeyword("ASC") | CaselessKeyword("DESC")))
# Define a function to extract column names and table names from a SQL query
def extract_columns_and_tables(query):
column_aliases = {}
table_aliases = {}
try:
result = selectStatement.parseString(query) # fixed variable name
# Extract column names and aliases
for col in result[1:]:
if AS in col:
col_name, col_alias = col.split(AS)
column_aliases[col_name.strip()] = col_alias.strip()
else:
column_aliases[col.strip()] = col.strip()
# Extract table names and aliases
for tbl in result[4:]:
if AS in tbl:
tbl_name, tbl_alias = tbl.split(AS)
table_aliases[tbl_name.strip()] = tbl_alias.strip()
else:
table_aliases[tbl.strip()] = tbl.strip()
except ParseException as e:
print("Error: ", e)
return column_aliases, table_aliases
if __name__ == '__main__':
# Test the function with the given SQL query
query = "SELECT employees.employee_id, employees.first_name, employees.last_name, departments.department_name, AVG(salaries.salary) AS average_salary FROM employees JOIN salaries ON employees.employee_id = salaries.employee_id JOIN dept_emp ON employees.employee_id = dept_emp.employee_id JOIN departments ON dept_emp.dept_no = departments.dept_no WHERE salaries.to_date > NOW() AND dept_emp.to_date > NOW() GROUP BY employees.employee_id, departments.department_name HAVING AVG(salaries.salary) > 50000 ORDER BY average_salary DESC"
column_aliases, table_aliases = extract_columns_and_tables(query)
# Print the results
print("Column aliases: ", column_aliases)
print("Table aliases: ", table_aliases)
我在这里要做的就是解析查询并获取表名和列名。 但是当我运行查询时,我收到一条错误消息:
Error: Expected CaselessKeyword 'FROM', found '(' (at char 105), (line:1, col:106)
我的查询不包含任何语法问题。同样在我创建的 selectstatement 中,
SELECT
+ COLUMNS
+ FROM
所以我不明白我在哪里犯了错误。
我正在尝试解析的查询:
SELECT employees.employee_id, employees.first_name, employees.last_name, departments.department_name, AVG(salaries.salary) AS average_salary FROM employees JOIN salaries ON employees.employee_id = salaries.employee_id JOIN dept_emp ON employees.employee_id = dept_emp.employee_id JOIN departments ON dept_emp.dept_no = departments.dept_no WHERE salaries.to_date > NOW() AND dept_emp.to_date > NOW() GROUP BY employees.employee_id, departments.department_name HAVING AVG(salaries.salary) > 50000 ORDER BY average_salary DESC
编辑1: 错误恰好发生在聚合函数:select子句中的AVG遇到左括号时:
SELECT employees.employee_id, employees.first_name, employees.last_name, departments.department_name, **AVG(**salaries.salary) AS average_salary FROM
谁能告诉我我在这里犯了什么错误,我该如何纠正? 非常感谢任何帮助。