正则表达式按非字母数字字符拆分,并对带撇号缩写的单词进行特殊处理

问题描述 投票:0回答:5

我正在尝试在 C# 中使用正则表达式分割字符串。我想根据所有非字母数字字符将其拆分,但当包含缩写时,我想将带撇号的单词视为整个单词,例如:

'd
's
't

一个例子应该可以阐明我想要实现的目标。给定一个句子,例如:

"Steve's dog is mine 'not yours' I know you'd like'it"

我想获得以下代币:

steve's, dog, is, mine, not, yours, i, know, you'd, like, it

目前我正在使用:

Regex.Split(str.ToLower(), @"[^a-zA-Z0-9_']").Where(s => s != String.Empty).ToArray<string>();

它返回:

steve's , dog , is , mine , 'not , yours', i , know, you'd, like'it
c# regex text split
5个回答
1
投票

这是一个半正则表达式半LINQ解决方案:

string s = "Steve's dog is mine 'not yours' I know you'd like'it";
string[] result = Regex.Matches(s, "\\w+('(s|d|t|ve|m))?")
    .Cast<Match>().Select(x => x.Value).ToArray();

我尝试匹配您想要获取的所有内容,而不是您想要分割的分隔符。然后我只是

Select
编辑了
Value
并将它们全部变成一个数组。


0
投票

\w+(?:'(?![aeiou])\w+)?

\w+         // 1 or more word chars
(?:         // optional uncaptured group
'           // apostrophe
(?![aeiou]) // look ahead and assert the character class doesn't match
\w+         // 1 or more word chars
)?          // end of optional group
  • 渔获:
    should've
    i'm
    'tis
  • 没听懂:
    rock 'n' roll

演示


0
投票

我能想到的解决方案是这样的:

var txt = "Steve's dog is mine 'not yours' I know you'd like'it, the Hundred Years' War, I'm - they're - don't - o'clock - we've 'the Hundred Years' War of yours'";

// Finding valid `'`s and replace them temporarily to something like `_replaceMe_`
// Then replace net `'` to a blank space ` `
var osTxt = Regex.Replace(txt.ToLower(), 
    @"(?<=[^a-z]i)'(?=m[^a-z])|(?<=[a-z])'(?=([rv]e|[ds])[^a-z])|(?<=[a-z]n)'(?=t[^a-z])|(?<=[^a-z]o)'(?=(clock)?[^a-z])", 
    "_replaceMe_")
    .Replace("\'"," ");

// Now, extract words from sentence and replace `_replaceMe_` back to `'`
var words = Regex.Matches(osTxt, @"\w+")
    .OfType<Match>()
    .Select(c=> c.Value.Replace("_replaceMe_", "\'"))
    .ToList();

但这在像

'
这样的句子中不会有
Years'
the Hundred Years' War

还有一些其他有效的情况被忽略;)。


0
投票
 //  also covers: I've I'm She'll you're you've";

        var sen = "Steve's dog is mine 'not yours' I know you'd like'it";

        StringBuilder builder = new StringBuilder();

        foreach (Match m in Regex.Matches(sen, @"[^' ]+\w+\'([dstm]|ll|ve|re)|\w+"))
        {
            builder.Append(m.Value).Append(",");
        }

        Console.WriteLine(builder);

        //Steve's,dog,is,mine,not,yours,I,know,you'd,like,it,

0
投票

我就是这样做的:

string phrase = "Steve's dog is mine 'not yours' I know you'd like'it";
var splittedString = Regex.Split(phrase, "[^a-zA-Z0-9']+")
    .Select(word => word.Trim('\''))
    .Where(word => word
        .Any())
    .ToArray();
© www.soinside.com 2019 - 2024. All rights reserved.