如何让一个简单的文字准备程序变得更优雅？

Question

我对Python很陌生，没有受过正规培训，我觉得我不太擅长制作简单、优雅、高效或逻辑思考结构的程序。

我在这里举个例子，看看人们可以提供什么样的建议。我主要使用复制和粘贴，经过多次试验和错误，制作了一些程序来准备要处理的文本文档。在这一篇中，我想去掉时间戳和识别信息之类的东西，只留下已提取的消息文本。

它做我想做的，但看起来很笨重，我相信还有其他方法可以做同样的事情，并且让它更灵活（也许用 pandas？）

提前谢谢你！

PS开始文档的正文在最后，是一个只有6行的小测试版

import re, codecs

def regex(txt):
    txt = re.sub(r'(LOC\d\d\d\d: )', "", txt)
    txt = re.sub(r'(\d\d:\d\d - )', "", txt)
    txt = re.sub(r'(\d\d/\d\d/\d\d\d\d, )', "", txt)
    return(txt)

stopwords=["Alexandra", "Gomez:"]
fj="(fichier joint)\r\n"
app="Votre code de sécurité"
mo="<Médias omis>"

corpus=[]

file="Lat_small.txt"
with codecs.open(file, 'r', 'utf8') as f:
    text=f.readlines()

    with codecs.open("Latinos_clean.txt", 'w', 'utf8') as l:
        for line in text:
            line=regex(line)
            if not line.endswith(fj) and not line.startswith(app) and not line.startswith(mo):
                corpus.append(line)

        for lines in corpus:
            lines =lines.split()
            words=[]
            for word in lines:
                if word not in stopwords:
                    words.append(word)

            final=' '.join(words)
            l.write(final+"\n")

Lat_small.txt 的文本：

19/04/2022, 20:49 - 亚历山德拉·戈麦斯 (Alexandra Gomez)：Gracias por sus mensajes :) 19/04/2022, 21:45 - LOC0006： PTT-20220419-WA0037.opus（fichier 联合） 19/04/2022，21:47 - LOC0007：Publicado también en un grupo que se llama wanted en facebook 19/04/2022，21:51 - LOC0008：Listo paisano，exitos！ 2022 年 4 月 19 日，21:51 - LOC0008： 24/04/2022，19:42 - 安全代码 LOC0082 变更。 Appuyez pour en savoir plus.

如何让一个简单的文字准备程序变得更优雅？

问题描述投票：0回答：0

最新问题

如何让一个简单的文字准备程序变得更优雅？

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0