python pyparsing非结构化文本文件

问题描述 投票:1回答:1

下面有一个文本文件,我想将其转换为csv文件。

+---------------------+--------------+---------------+
| column_date         | column_id    | column_desc   |
+---------------------+--------------+---------------+
| 2001-01-01 00:00:00 | 12345        | abc bar       |
| 2001-01-01 00:00:00 | 4567         | defg          |
+---------------------+--------------+---------------+

我正在寻找的预期输出是:

column_date,column_id,column_desc
2001-01-01 00:00:00,12345,abc bar
2001-01-01 00:00:00,4567,defg

有没有通过pyparsing做到这一点的例子?谢谢。

python pyparsing
1个回答
0
投票

可能的解决方案

import re

with open("file.csv", "r+") as myFile:
    content = myFile.read()
    regex = r'^\|\s+(.+)\s+\|\s+(\w+)\s+\|\s+(.+)\s+\|$'
    print(content)
    match = re.findall(regex, content, re.MULTILINE)
    [print(line[0]+","+line[1]+","+line[2]) for line in match]

输出

|---------------------+-----------+-------------|
| column_date         | column_id | column_desc |
|---------------------+-----------+-------------|
| 2001-01-01 00:00:00 |     12345 | abc bar     |
| 2001-01-01 00:00:00 |      4567 | defg        |
|---------------------+-----------+-------------|

column_date        ,column_id,column_desc
2001-01-01 00:00:00,12345,abc bar    
2001-01-01 00:00:00,4567,defg    

您可能要在打印之前删除不需要的空格

© www.soinside.com 2019 - 2024. All rights reserved.