我需要从评估 .doc 文件中提取候选人的分数,该文件包含有关候选人接受测试的评估分数的信息。这些文件可以在每个候选人的文件夹中找到。 我需要提取“wonderlic”的分数并创建一个包含候选人姓名和分数的数据框。 我有以下代码,它打开并读取文件夹,找到正确的文档,然后读取并从 .doc 文件中提取所需的信息。 我对此代码进行了测试,它打开了一个特定的 Word .doc 文件,通读它并提取信息,该代码有效。现在,当 我尝试使用文件夹,它给了我以下错误:
A LastName07 2022r 1 / 4 B 姓氏文件夹 第 2 个(共 4 个) C 姓氏文件夹 3 of 4 D 姓氏文件夹第 4 个(共 4 个)
D 姓氏评估 1.2.doc 打开文档时出错:C:\Users\Mine\OneDrive\Escritorio\CompanyData\Test 文件夹\D LastName\D LastName 1.2.doc 列表索引超出范围 D 姓氏评估.doc 打开文档时出错:C:\Users\Mine\OneDrive\Escritorio\CompanyData\Test 文件夹\D LastName\D LastName Assessment.doc 列表索引超出范围
您能帮我理解“列表索引超出范围”错误的含义以及可以采取哪些措施来修复它吗?
导入win32com.client 从 docx 导入文档 导入操作系统 进口再 导入压缩文件 导入 xml.dom.minidom 将 pandas 导入为 pd
word = win32com.client.Dispatch("Word.Application")
i = 0 对于目录列表中的文件夹: 我 += 1 print("候选文件夹 " + str(i) + " of " + str(tot), end=' ') 打印(文件夹)
folder_path = os.path.join(directory, folder) # Get the full path to the folder
for filename in os.listdir(folder_path): # Iterate through files in the folder
file_path = os.path.join(folder_path, filename) # Get the full path to the file
if (filename.endswith('.doc') or filename.endswith('.docx')) and "assessment" in filename.lower():
try:
print(filename)
doc = word.Documents.Open(file_path)
# Extract Wonderlic Scores
for paragraph in doc.paragraphs:
text = paragraph.Range.Text
wonderlic_keyword = "THE WONDERLIC"
wiesen_keyword = "WIESEN TEST"
# Set a flag to indicate if we are inside the Wonderlic section
inside_wonderlic_section = False
wonderlic_section_lines = [] # Store the lines of the Wonderlic section
# Check if we are entering the Wonderlic section
if wonderlic_keyword.lower() in text.lower():
inside_wonderlic_section = True
# Check if we are exiting the Wonderlic section
elif inside_wonderlic_section and wiesen_keyword.lower() in text.lower():
break
# Add the lines of the Wonderlic section to the list
if inside_wonderlic_section:
wonderlic_section_lines.append(text)
WonderlicScore = wonderlic_section_lines[4]
WonderlicScore = re.sub("[^0-9]", "", WonderlicScore) # substituting everything that is NOT a digit to nothing
# creating the dataframe for WONDERLIC
WonderlicData = {
"Wonderlic Score": [int(WonderlicScore)]
}
Wonderlicdf = pd.DataFrame(WonderlicData)
except Exception as e:
print(f"Error opening document: {file_path}")
print(e)