LINQ:选择任何字符串以某个字符开头的行

问题描述 投票:0回答:3

我希望从表中提取所有行,其中列(字符串)中至少有一个以指定字符开头的单词。例:

Row 1: 'this is the first row'
Row 2: 'this is th second row'
Row 3: 'this is the third row'

如果指定的字符是T - >我将提取所有3行如果指定的字符是S - >我只提取第二列...

请帮我

linq
3个回答
0
投票

假设你的意思是“以空格分隔的字符序列,或者以”字“开始以空格或空格结尾”,那么你可以拆分分隔符并测试它们的匹配:

var src = new[] {
    "this is the first row",
    "this is th second row",
    "this is the third row"
};

var findChar = 'S';
var lowerFindChar = findChar.ToLower();
var matches = src.Where(s => s.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Any(w => w.ToLower()[0] == lowerFindChar));

LINQ Enumerable.Any方法tests a sequence to see if any element matches,因此您可以将每个字符串拆分成一个单词序列,并查看是否有任何单词以所需字母开头,以补偿大小写。


0
投票

试试这个:

rows.Where(r => Regex.IsMatch(r, " [Tt]"))

你可以用Tt替换Ss(假设你想要大写或小写)。


0
投票

问题当然是,什么是“字”?

根据您的定义,句子中的字符序列“单词”是否在单词上方?它不是以空间开始,甚至不是空白空间。

单词的定义可以是:

定义wordCharacter:类似于A-Z,a-z。 定义单词: - 字符串开头的非空字符串序列,后跟非字字符 - 或字符串末尾的非空字符串序列,前面是非字字符 - 任何非空序列字符串中的wordCharacters前面和后面跟着非wordcharacter定义单词的开头:单词的第一个字符。

字符串:“一些奇怪的字符:'A',9,äll,B9 C $ X? - 单词:一些,奇怪的字符,A - 不是单词:9,äll,B9,C $ X?

因此,您首先必须准确指定单词的含义,然后才能定义函数。

我会把它写成IEnumerable<string>的扩展方法。用法看起来与LINQ类似。见Extension Methods Demystified

bool IsWordCharacter(char c) {... TODO: implement your definition of word character}

static IEnumerable<string> SplitIntoWords(this string text)
{
    // TODO: exception if text null
    if (text.Length == 0) return 

    int startIndex = 0;
    while (startIndex != text.Length)
    {   // not at end of string. Find the beginning of the next word:
        while (startIndex < text.Length && !IsWordCharacter(text[startIndex])) 
        {
            ++startIndex;
        }

        // now startIndex points to the first character of the next word
        // or to the end of the text

        if (startIndex != text.Length)
        {   // found the beginning of a word.
            // the first character after the word is either the first non-word character,
            // or the end of the string

            int indexAfterWord = startWordIndex + 1;
            while (indexAfterWord < text.Length && IsWordCharacter(text[indexAfterWord]))
            {
                ++indexAfterWord;
            }

            // all characters from startIndex to indexAfterWord-1 are word characters
            // so all characters between startIndexWord and indexAfterWord-1 are a word
            int wordLength = indexAfterWord - startIndexWord;
            yield return text.SubString(startIndexWord, wordLength);
        }
    }
}

现在您已经有了将任何字符串拆分为单词定义的过程,您的查询将很简单:

IEnumerabl<string> texts = ...
char specifiedChar = 'T';

// keep only those texts that have at least one word that starts with specifiedChar:
var textsWithWordThatStartsWithSpecifiedChar = texts
    // split the text into words
    // keep only the words that start with specifiedChar
    // if there is such a word: keep the text
    .Where(text => text.SplitIntoWords()
                   .Where(word => word.Length > 0 && word[0] == specifiedChar)
                   .Any());
© www.soinside.com 2019 - 2024. All rights reserved.