接受 CJK 字符集(中文、日文和韩文)中除特殊字符之外的所有字符和数字 (0-9) 的正则表达式模式是什么?
此信息从 UCD 界面
收集
这是最新的 Unicode 10 信息。
输出为 88,964 个字符。
从界面:
使用属性搜索CJK,它们与
一起添加到“自定义接收”页面
过滤器它们必须是字母或数字并且具有指定的插槽。
正则表达式的
# CJK et all
[\p{Block=CJK_Compatibility}\p{Block=CJK_Compatibility_Forms}\p{Block=CJK_Compatibility_Ideographs}\p{Block=CJK_Compatibility_Ideographs_Supplement}\p{Block=CJK_Radicals_Supplement}\p{Block=CJK_Strokes}\p{Block=CJK_Symbols_And_Punctuation}\p{Block=CJK_Unified_Ideographs}\p{Block=CJK_Unified_Ideographs_Extension_A}\p{Block=CJK_Unified_Ideographs_Extension_B}\p{Block=CJK_Unified_Ideographs_Extension_C}\p{Block=CJK_Unified_Ideographs_Extension_D}\p{Block=CJK_Unified_Ideographs_Extension_E}\p{Block=CJK_Unified_Ideographs_Extension_F}\p{Block=Enclosed_CJK_Letters_And_Months}]
# Must be letters or numbers
(?<= [\p{L}\p{N}] )
# Leave out the unassigned slots
(?<! \p{General_Category=Unassigned} )
输出转换为 UTF-8/32
(?:
[\x{3005}-\x{3007}\x{3021}-\x{3029}\x{3031}-\x{3035}\x{3038}-\x{303C}\x{3220}-\x{3229}\x{3248}-\x{324F}\x{3251}-\x{325F}\x{3280}-\x{3289}\x{32B1}-\x{32BF}\x{3400}-\x{4DB5}\x{4E00}-\x{9FEA}\x{F900}-\x{FA6D}\x{FA70}-\x{FAD9}\x{20000}-\x{2A6D6}\x{2A700}-\x{2B734}\x{2B740}-\x{2B81D}\x{2B820}-\x{2CEA1}\x{2CEB0}-\x{2EBE0}\x{2F800}-\x{2FA1D}]
)
输出转换为 UTF-16
(?:
[\x{3005}-\x{3007}\x{3021}-\x{3029}\x{3031}-\x{3035}\x{3038}-\x{303C}\x{3220}-\x{3229}\x{3248}-\x{324F}\x{3251}-\x{325F}\x{3280}-\x{3289}\x{32B1}-\x{32BF}\x{3400}-\x{4DB5}\x{4E00}-\x{9FEA}\x{F900}-\x{FA6D}\x{FA70}-\x{FAD9}]
|
(?:
[\x{D840}-\x{D868}] [\x{DC00}-\x{DFFF}]
| \x{D869} [\x{DC00}-\x{DED6}\x{DF00}-\x{DFFF}]
| [\x{D86A}-\x{D86C}] [\x{DC00}-\x{DFFF}]
| \x{D86D} [\x{DC00}-\x{DF34}\x{DF40}-\x{DFFF}]
| \x{D86E} [\x{DC00}-\x{DC1D}\x{DC20}-\x{DFFF}]
| [\x{D86F}-\x{D872}] [\x{DC00}-\x{DFFF}]
| \x{D873} [\x{DC00}-\x{DEA1}\x{DEB0}-\x{DFFF}]
| [\x{D874}-\x{D879}] [\x{DC00}-\x{DFFF}]
| \x{D87A} [\x{DC00}-\x{DFE0}]
| \x{D87E} [\x{DC00}-\x{DE1D}]
)
)