我正在尝试在 C# 中使用正则表达式分割字符串。我想根据所有非字母数字字符将其拆分,但当包含缩写时,我想将带撇号的单词视为整个单词,例如:
'd
、's
、't
。"Steve's dog is mine 'not yours' I know you'd like'it"
我想获得以下代币:
steve's, dog, is, mine, not, yours, i, know, you'd, like, it
目前我正在使用:
Regex.Split(str.ToLower(), @"[^a-zA-Z0-9_']").Where(s => s != String.Empty).ToArray<string>();
它返回:
steve's , dog , is , mine , 'not , yours', i , know, you'd, like'it
这是一个半正则表达式半LINQ解决方案:
string s = "Steve's dog is mine 'not yours' I know you'd like'it";
string[] result = Regex.Matches(s, "\\w+('(s|d|t|ve|m))?")
.Cast<Match>().Select(x => x.Value).ToArray();
我尝试匹配您想要获取的所有内容,而不是您想要分割的分隔符。然后我只是
Select
编辑了 Value
并将它们全部变成一个数组。
\w+(?:'(?![aeiou])\w+)?
\w+ // 1 or more word chars
(?: // optional uncaptured group
' // apostrophe
(?![aeiou]) // look ahead and assert the character class doesn't match
\w+ // 1 or more word chars
)? // end of optional group
should've
、i'm
、'tis
rock 'n' roll
我能想到的解决方案是这样的:
var txt = "Steve's dog is mine 'not yours' I know you'd like'it, the Hundred Years' War, I'm - they're - don't - o'clock - we've 'the Hundred Years' War of yours'";
// Finding valid `'`s and replace them temporarily to something like `_replaceMe_`
// Then replace net `'` to a blank space ` `
var osTxt = Regex.Replace(txt.ToLower(),
@"(?<=[^a-z]i)'(?=m[^a-z])|(?<=[a-z])'(?=([rv]e|[ds])[^a-z])|(?<=[a-z]n)'(?=t[^a-z])|(?<=[^a-z]o)'(?=(clock)?[^a-z])",
"_replaceMe_")
.Replace("\'"," ");
// Now, extract words from sentence and replace `_replaceMe_` back to `'`
var words = Regex.Matches(osTxt, @"\w+")
.OfType<Match>()
.Select(c=> c.Value.Replace("_replaceMe_", "\'"))
.ToList();
但这在像
'
这样的句子中不会有Years'
的the Hundred Years' War
。 // also covers: I've I'm She'll you're you've";
var sen = "Steve's dog is mine 'not yours' I know you'd like'it";
StringBuilder builder = new StringBuilder();
foreach (Match m in Regex.Matches(sen, @"[^' ]+\w+\'([dstm]|ll|ve|re)|\w+"))
{
builder.Append(m.Value).Append(",");
}
Console.WriteLine(builder);
//Steve's,dog,is,mine,not,yours,I,know,you'd,like,it,
我就是这样做的:
string phrase = "Steve's dog is mine 'not yours' I know you'd like'it";
var splittedString = Regex.Split(phrase, "[^a-zA-Z0-9']+")
.Select(word => word.Trim('\''))
.Where(word => word
.Any())
.ToArray();