如何用规范化的单词替换 srt 文件中的变音符号?
我有一个罗马尼亚 srt,我正在尝试使用 jellyfin 流式传输电影,但我遇到一个问题,该应用程序不支持像
ĂăÂâÎîȘșȚț
这样的特殊字符,所以我试图摆脱它们。
我尝试使用 unidecode 但这些单词奇怪地被替换为
ț
-> th
, ș
-> o
'我也尝试过使用 sed 来替换字符,但一些字符(如
ș
)显示为 º
,因此以下函数不会替换它们:
def strip_accents(s):
d = 'ĂăÂâÎîȘșȚț'
n = 'AaAaIiSsTt'
dl = [i for i in d]
nl = [i for i in n]
ii = 0
for x in dl:
s = re.sub(x, nl[ii], s)
ii += 1
return s
对于字幕,先用记事本打开,然后用另存为,在这一步
Encoding:UTF-8
然后保存
您可以将此代码用于列表:
import re
def strip_accents():
d = 'ĂăÂâÎîȘșȚț'
n = 'AaAaIiSsTt'
dl = [i for i in d]
# print(dl)
nl = [i for i in n]
# print(nl)
new_list = []
i =0
for string in dl:
new_string = string. replace(dl[i], nl[i])
new_list. append(new_string)
# print(new_strings)
i +=1
return new_list
print(strip_accents())
因此,多亏了 github 托管的存储库,我找到了一种方法来完全实现我想要的功能。我只是将奇怪的字符替换为正确的变音符号
º,
ª
ș
Ș
,一切都按照我想要的方式工作。我不需要替换变音符号来使字幕正常工作。
https://github.com/pckltr/corecteaza-subtitrari/
Python代码:
def fix_accents(s):
char_dict = { "º": "ș", "ª": "Ș", "ş": "ș", "Ş": "Ș", "ţ": "ț", "Ţ": "Ț", "þ": "ț", "Þ": "Ț", "ã": "ă" }
for k,v in char_dict.items():
s = s.replace(k, v)
return s
我知道这是一篇旧文章,但是对于想要这种脚本的人来说,可以同时运行多个 srt 文件,我制作了一个使用 os 和 pysrt 的 python 脚本来做到这一点。
首先您需要在终端中运行
pip install pysrt
。
然后创建一个 python 文件并将其放入包含要更改的 srt 文件的文件夹中。
然后你只要运行它,它就可以工作了!
import pysrt
import os
def change_charset(file_name):
# open the file with the encoding because sometimes you get an error
subs = pysrt.open(file_name, encoding='iso-8859-1')
dictionar = {"º": "s", "ª": "S", "ş": "s", "Ş": "S",
"ţ": "t", "Ţ": "T", "þ": "t", "Þ": "T", "ã": "a", "Ã": "A",
"õ": "o", "Õ": "O", "ç": "c", "Ç": "C", "ñ": "n", "Ñ": "N",
"á": "a", "Á": "A", "é": "e", "É": "E", "í": "i", "Í": "I",
"ó": "o", "Ó": "O", "ú": "u", "Ú": "U", "ý": "y", "Ý": "Y",
"à": "a", "À": "A", "è": "e", "È": "E",
"Î": "I", "î": "i", "Â": "A", "â": "a"}
# dictionary with the characters that need to be changed
for sub in subs: # loop through the subtitles
for key, value in dictionar.items(): # loop through the dictionary
# replace the characters with the new ones
sub.text = sub.text.replace(key, value)
# save the file with the new encoding
subs.save(file_name, encoding="utf-8")
# print the name of the file that was saved
print(f"Done saving file {file_name}")
def main():
for items in os.listdir(): # loop through the files in the directory
if items.endswith(".srt"): # check if the file is a subtitle file
change_charset(items) # call the function to change the charset
if __name__ == "__main__":
main()
我这样做的目的是为了易于使用,但是您可以自己指定路径并将文件放在单个目录中。
我尝试了 iCooKie 发布的上述脚本。显然,它将奇怪的字符转换为普通单词,如 s t i。你们知道我需要在脚本中更改什么才能将变音符号转换为 ĂăâÎîşşşşş 而非 AaAaIiSsTt 吗?