使用 Python 删除阿拉伯语变音符号

问题描述投票：0回答：3

我想通过使用 Python 删除阿拉伯语变音符号来过滤我的文本

例如

文本：填充后： ???????????????

我发现这可以使用 CAMeL Tools 来完成，但我不知道如何实现

python

arabic

3个回答

12
投票

pyArabic

： import pyarabic.araby as araby before_filter="اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا" after_filter = araby.strip_diacritics(before_filter) print(after_filter) # will print : اللهم اغفر لنا ولوالدينا

您可以尝试不同的带状滤镜：

araby.strip_harakat(before_filter) # 'اللّهمّ اغفر لنا ولوالدينا' araby.strip_lastharaka(before_filter) # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا' araby.strip_shadda(before_filter) # 'اللَهمَ اغْفِرْ لنَا ولوالدِينَا' araby.strip_small(before_filter) # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا' araby.strip_tashkeel(before_filter) # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا' araby.strip_tatweel(before_filter) # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا'

0
投票

text = 'text with Arabic Diacritics to be removed' text = ''.join([t for t in text if t not in ['ِ', 'ُ', 'ٓ', 'ٰ', 'ْ', 'ٌ', 'ٍ', 'ً', 'ّ', 'َ']]) print(text)

如果您想要阿拉伯语变音符号的完整列表，您也可以从 pyArabic 获取它，独立示例：

import unicodedata try: unichr except NameError: unichr = chr text = 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا ' text = ''.join([t for t in text if t not in [unichr(x) for x in range(0x0600, 0x06ff) if unicodedata.category(unichr(x)) == "Mn"]]) print(text)

0
投票

import re text = 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا ' output=re.sub(u'[\u064e\u064f\u0650\u0651\u0652\u064c\u064b\u064d\u0640\ufc62]','',text) print(output) #اللهم اغفر لنا ولوالدينا

使用 Python 删除阿拉伯语变音符号

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3