用正则表达式删除除汉字以外的所有字符吗？

Question

我有一个用中文写的句子字符串。

其中包含汉字以及其他填充内容，例如空格，逗号，感叹号等，均以UTF8编码。

使用带有latin1字符串的正则表达式，我可以使用preg_replace和[a-zA-Z]对其进行清洁并除去填充物。

在删除所有填充项时，如何在中文字符串中仅保留中文“字母”字符？

Answer 1

根据this document，这是汉字的unicode范围：

表12-2。包含汉字表意文字的方块

Block                                Range         Comment
CJK Unified Ideographs               4E00–9FFF     Common
CJK Unified Ideographs Extension A   3400–4DBF     Rare
CJK Unified Ideographs Extension B   20000–2A6DF   Rare, historic
CJK Unified Ideographs Extension C   2A700–2B73F   Rare, historic
CJK Unified Ideographs Extension D   2B740–2B81F   Uncommon, some in current use
CJK Compatibility Ideographs         F900–FAFF     Duplicates, unifiable variants, corporate
characters
CJK Compatibility Ideographs Supplement 2F800–2FA1F Unifiable variants

您可以这样使用它：

preg_replace('/[^\u4E00-\u9FFF]+/u', '', $string);

或

preg_replace('/\P{Han}+/u', '', $string);

其中\P是\p的否定>>

请参阅here以获取所有unicode scripts

用正则表达式删除除汉字以外的所有字符吗？

问题描述投票：1回答：1

1个回答

最新问题

用正则表达式删除除汉字以外的所有字符吗？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1