使用Python脚本将树状数据集插入MySQL

问题描述 投票:0回答:1
你好,我对 MySQL 很陌生。我在将大型“树状”数据插入数据库时遇到一些问题。

数据:我有一个各种主题、子主题、子子主题等的树状数据。我已将其写入 yml 文件中。这是该 yml 文件的一小部分。

mathematics: - algebra: - linear_and_multilinear_algebra: - vector_spaces - matrix_operations - group_theory - analysis: - real_analysis - complex_analysis - functional_analysis: - integral_equations - differential_equation: - ode - pde - operator_theory - operator_algebra: - c*-algebras: - c*-algebras_and_their_representations - operator_spaces_associated_with_c*-algebras - kk-theory_and_k-homology_of_c*-algebras - non-selfadjoint_operator_algebras - banach_algebras - von_neumann_algebras: - non-commutative_geometry - factors - tomita_takesaki_theory - operator_spaces - non-commutative_algebras natural_sciences: - biology: - botany - zoology - microbiology: - virology - bacteriology - chemistry: - organic_chemistry - inorganic_chemistry - physical_chemistry - physics: - classical_mechanics - quantum_mechanics: - quantum_entanglement - quantum_field_theory technology_and_engineering: - computer_science: - algorithms - data_structures - artificial_intelligence: - machine_learning

MySQL 数据库: 为了存储这些数据,我在 MySQL 中创建了以下表

CREATE DATABASE IF NOT EXISTS knowledge; USE knowledge; -- Table to store topics CREATE TABLE IF NOT EXISTS topics ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL ); -- Closure table CREATE TABLE IF NOT EXISTS closure ( ancestor INT NOT NULL, descendant INT NOT NULL, length INT NOT NULL, PRIMARY KEY (ancestor, descendant), FOREIGN KEY (ancestor) REFERENCES topics(id), FOREIGN KEY (descendant) REFERENCES topics(id) );

问题。我想写一个python脚本,可以读取yml文件并填充两个表。

我编写了以下脚本,但它给出了错误。

代码

# Load the YAML data from your file with open(cat_config_file, 'r') as yaml_file: data = yaml.load(yaml_file, Loader=yaml.FullLoader) # Establish a connection to the MySQL database db_connection = mysql.connector.connect(**db_config) # Create a cursor object to interact with the database cursor = db_connection.cursor() # Recursive function to insert topics and build the closure table def insert_topic(topic, parent_id=None, level=0): # Insert the topic into the topics table cursor.execute("INSERT INTO topics (name) VALUES (%s)", (topic,)) topic_id = cursor.lastrowid # Insert the closure record for the current topic and its parent with level if parent_id is not None: cursor.execute("INSERT INTO closure (ancestor, descendant, length) VALUES (%s, %s, %s)", (parent_id, topic_id, level)) # Recursively insert subtopics if isinstance(data[topic], list): for subtopic in data[topic]: insert_topic(subtopic, topic_id, level + 1) # Iterate through the top-level topics and insert them for top_level_topic in data.keys(): insert_topic(top_level_topic) # Commit changes and close the cursor and connection db_connection.commit() cursor.close() db_connection.close() print("Data inserted into the MySQL database, with hierarchy levels in the 'closure' table.")

错误

Traceback (most recent call last): File "book_category.py", line 54, in <module> insert_topic(top_level_topic) File "book_category.py", line 50, in insert_topic insert_topic(subtopic, topic_id, level + 1) File "book_category.py", line 40, in insert_topic cursor.execute("INSERT INTO topics (name) VALUES (%s)", (topic,)) File "/home/indrajit/Documents/hello_world/mysql/env/lib/python3.8/site-packages/mysql/connector/cursor_cext.py", line 317, in execute prepared = self._cnx.prepare_for_mysql(params) File "/home/indrajit/Documents/hello_world/mysql/env/lib/python3.8/site-packages/mysql/connector/connection_cext.py", line 802, in prepare_for_mysql result = self._cmysql.convert_to_mysql(*params) _mysql_connector.MySQLInterfaceError: Python type dict cannot be converted
请帮我修复代码。谢谢。

PS. 我想补充一点,我将查询数据库以获取特定节点的所有父节点。例如,如果我输入 pde

 那么它应该返回 
mathematics > analysis > functional_analysis > differential_equation > pde

    

mysql python-3.x tree hierarchical-data
1个回答
0
投票
该错误与闭包表甚至 SQL 无关。这纯粹是 Python 中的一个问题,涉及如何遍历 YAML 结构。

您开始插入层次结构:

for top_level_topic in data.keys(): insert_topic(top_level_topic)
 

data.keys()

 返回一个字符串列表,这些字符串是层次结构顶部的键。字符串是标量,它们可以用作 INSERT 语句的参数。

但是,当您尝试执行递归步骤来插入层次结构的下一个级别时,您会传递刚刚插入的主题引用的 Python 字典。

for subtopic in data[topic]: insert_topic(subtopic, topic_id, level + 1)
您没有使用 

subtopic.keys()

,因此 
subtopic
 是一个字典,而不是一个字符串。

此外,递归函数将始终引用层次结构顶部的

data[topic]

。即使您传递子主题键,他们也会尝试在层次结构顶部找到该键。

我会设计这个递归函数来接受字典,并检查该字典本身的键。

然后你将开始第一步:

insert_topic(data)
我将让您决定如何设计该函数中的代码来遍历 YAML 结构,因为它同时具有字典和元组。 YAML 比简单的树要复杂一些。

© www.soinside.com 2019 - 2024. All rights reserved.