我有一个 Word 文档,它是用于我们更广泛的会议的标准模板。我的Word文档中有两列,第一列包含所有标题,第二列包含其实际详细信息。我附上了相同的屏幕截图来显示Word文档的结构。
我现在想使用 python 从两列中提取文本并将它们存储在数据框中。生成的数据框应如下所示:
Title In Force? Date Who attended the event?
Test Yes 03/10/1999 X, Y
我怎样才能实现这个目标?
abdulsaboor's
答案的解析器:
def get_table_from_docx(document):
tables = []
for table in document.tables:
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
if cell.text:
df[i][j] = cell.text
tables.append(pd.DataFrame(df))
return tables
然后:
df = get_table_from_docx(document)[0]
df = df.set_index(0).T
df["Who attended the event?"] = df["Who attended the event?"].str.replace("\n",", ")
出:
0 Title In Force? Date Who attended the event?
1 Test Yes 03/10/1999 X, Y