替换Python中的变音符号

问题描述 投票:0回答:4

如何用规范化的单词替换 srt 文件中的变音符号?

我有一个罗马尼亚 srt,我正在尝试使用 jellyfin 流式传输电影,但我遇到一个问题,该应用程序不支持像

ĂăÂâÎîȘșȚț
这样的特殊字符,所以我试图摆脱它们。

我尝试使用 unidecode 但这些单词奇怪地被替换为

ț
->
th
,
ș
->
o

'我也尝试过使用 sed 来替换字符,但一些字符(如

ș
)显示为
º
,因此以下函数不会替换它们:

def strip_accents(s):
    d = 'ĂăÂâÎîȘșȚț'
    n = 'AaAaIiSsTt'
    dl = [i for i in d]
    nl = [i for i in n]
    ii = 0
    for x in dl:
        s = re.sub(x, nl[ii], s)
        ii += 1
    return s
python-3.x diacritics
4个回答
0
投票

对于字幕,先用记事本打开,然后用另存为,在这一步

Encoding:UTF-8
然后保存

您可以将此代码用于列表:

import re

def strip_accents():
   d = 'ĂăÂâÎîȘșȚț'
   n = 'AaAaIiSsTt'
   dl = [i for i in d]
   # print(dl)
   nl = [i for i in n]
   # print(nl)
   new_list = []
   i =0
   for string in dl:
      new_string = string. replace(dl[i], nl[i])
      new_list. append(new_string)
      # print(new_strings)
      i +=1
   return new_list  

print(strip_accents())   

0
投票

因此,多亏了 github 托管的存储库,我找到了一种方法来完全实现我想要的功能。我只是将奇怪的字符替换为正确的变音符号

º,
ª
ș
Ș
,一切都按照我想要的方式工作。我不需要替换变音符号来使字幕正常工作。

https://github.com/pckltr/corecteaza-subtitrari/

Python代码:

def fix_accents(s):
    char_dict = { "º": "ș", "ª": "Ș", "ş": "ș", "Ş": "Ș", "ţ": "ț", "Ţ": "Ț", "þ": "ț", "Þ": "Ț", "ã": "ă"  }
    for k,v in char_dict.items():
        s = s.replace(k, v)
    return s

0
投票

我知道这是一篇旧文章,但是对于想要这种脚本的人来说,可以同时运行多个 srt 文件,我制作了一个使用 os 和 pysrt 的 python 脚本来做到这一点。

首先您需要在终端中运行

pip install pysrt
。 然后创建一个 python 文件并将其放入包含要更改的 srt 文件的文件夹中。 然后你只要运行它,它就可以工作了!

import pysrt
import os


def change_charset(file_name):
    # open the file with the encoding because sometimes you get an error
    subs = pysrt.open(file_name, encoding='iso-8859-1')

    dictionar = {"º": "s", "ª": "S", "ş": "s", "Ş": "S",
                 "ţ": "t", "Ţ": "T", "þ": "t", "Þ": "T", "ã": "a", "Ã": "A",
                 "õ": "o", "Õ": "O", "ç": "c", "Ç": "C", "ñ": "n", "Ñ": "N",
                 "á": "a", "Á": "A", "é": "e", "É": "E", "í": "i", "Í": "I",
                 "ó": "o", "Ó": "O", "ú": "u", "Ú": "U", "ý": "y", "Ý": "Y",
                 "à": "a", "À": "A", "è": "e", "È": "E",
                 "Î": "I", "î": "i", "Â": "A", "â": "a"} 
 # dictionary with the characters that need to be changed

    for sub in subs:  # loop through the subtitles
        for key, value in dictionar.items():  # loop through the dictionary
            # replace the characters with the new ones
            sub.text = sub.text.replace(key, value)
    # save the file with the new encoding
    subs.save(file_name, encoding="utf-8")
    # print the name of the file that was saved
    print(f"Done saving file {file_name}")


def main():
    for items in os.listdir():  # loop through the files in the directory
        if items.endswith(".srt"):  # check if the file is a subtitle file
            change_charset(items)  # call the function to change the charset


if __name__ == "__main__":
    main()

我这样做的目的是为了易于使用,但是您可以自己指定路径并将文件放在单个目录中。


0
投票

我尝试了 iCooKie 发布的上述脚本。显然,它将奇怪的字符转换为普通单词,如 s t i。你们知道我需要在脚本中更改什么才能将变音符号转换为 ĂăâÎîşşşşş 而非 AaAaIiSsTt 吗?

© www.soinside.com 2019 - 2024. All rights reserved.