打开文档时出错 - 列表索引超出范围[已关闭]

问题描述 投票:0回答:0

我需要从评估 .doc 文件中提取候选人的分数,该文件包含有关候选人接受测试的评估分数的信息。这些文件可以在每个候选人的文件夹中找到。 我需要提取“wonderlic”的分数并创建一个包含候选人姓名和分数的数据框。 我有以下代码,它打开并读取文件夹,找到正确的文档,然后读取并从 .doc 文件中提取所需的信息。 我对此代码进行了测试,它打开了一个特定的 Word .doc 文件,通读它并提取信息,该代码有效。现在,当 我尝试使用文件夹,它给了我以下错误:

A LastName07 2022r 1 / 4 B 姓氏文件夹 第 2 个(共 4 个) C 姓氏文件夹 3 of 4 D 姓氏文件夹第 4 个(共 4 个)

D 姓氏评估 1.2.doc 打开文档时出错:C:\Users\Mine\OneDrive\Escritorio\CompanyData\Test 文件夹\D LastName\D LastName 1.2.doc 列表索引超出范围 D 姓氏评估.doc 打开文档时出错:C:\Users\Mine\OneDrive\Escritorio\CompanyData\Test 文件夹\D LastName\D LastName Assessment.doc 列表索引超出范围

您能帮我理解“列表索引超出范围”错误的含义以及可以采取哪些措施来修复它吗?

这是我在 Jupyter Notebook 中使用的代码:

导入win32com.client 从 docx 导入文档 导入操作系统 进口再 导入压缩文件 导入 xml.dom.minidom 将 pandas 导入为 pd

word = win32com.client.Dispatch("Word.Application")

i = 0 对于目录列表中的文件夹: 我 += 1 print("候选文件夹 " + str(i) + " of " + str(tot), end=' ') 打印(文件夹)

folder_path = os.path.join(directory, folder)  # Get the full path to the folder

for filename in os.listdir(folder_path):  # Iterate through files in the folder
    file_path = os.path.join(folder_path, filename)  # Get the full path to the file
    if (filename.endswith('.doc') or filename.endswith('.docx')) and "assessment" in filename.lower():
        try:
            print(filename)

            doc = word.Documents.Open(file_path)

            # Extract Wonderlic Scores
            for paragraph in doc.paragraphs:
                text = paragraph.Range.Text

                wonderlic_keyword = "THE WONDERLIC"
                wiesen_keyword = "WIESEN TEST"

                # Set a flag to indicate if we are inside the Wonderlic section
                inside_wonderlic_section = False
                wonderlic_section_lines = []  # Store the lines of the Wonderlic section

                # Check if we are entering the Wonderlic section
                if wonderlic_keyword.lower() in text.lower():
                    inside_wonderlic_section = True
                # Check if we are exiting the Wonderlic section
                elif inside_wonderlic_section and wiesen_keyword.lower() in text.lower():
                    break
                    
                 # Add the lines of the Wonderlic section to the list
                if inside_wonderlic_section:
                    wonderlic_section_lines.append(text)

            WonderlicScore = wonderlic_section_lines[4]
            WonderlicScore = re.sub("[^0-9]", "", WonderlicScore)  # substituting everything that is NOT a digit to nothing

            # creating the dataframe for WONDERLIC
            WonderlicData = {
                "Wonderlic Score": [int(WonderlicScore)]
            }
            Wonderlicdf = pd.DataFrame(WonderlicData)
            
        except Exception as e:
            print(f"Error opening document: {file_path}")
            print(e)         
python jupyter-notebook extract win32com .doc
© www.soinside.com 2019 - 2024. All rights reserved.