匹配字符串中数字之间的所有文本的模式

Question

我正在尝试使用正则表达式为圣经一章中的每节经文创建一个组。例如，假设文本如下：

1 This is the genealogy of Jesus the Messiah the son of David, the son of Abraham:
2 Abraham was the father of Isaac,
Isaac the father of Jacob,
Jacob the father of Judah and his brothers,
3 Judah the father of Perez and Zerah, whose mother was Tamar,
Perez the father of Hezron,
Hezron the father of Ram,

我寻求的结果是如下3组：

第 1 组：

1 This is the genealogy of Jesus the Messiah the son of David, the son of Abraham:

第二组：

2 Abraham was the father of Isaac,
Isaac the father of Jacob,
Jacob the father of Judah and his brothers,

第 3 组：

3 Judah the father of Perez and Zerah, whose mother was Tamar,
Perez the father of Hezron,
Hezron the father of Ram,

...（额外的好处是用空格替换每组中字符串的任何换行符）

我尝试过很多不同的模式，但都没有成功。我上次尝试的模式是

@"([^0-9][\s\S^0-9]*[^0-9])"

，它产生了两组，每组包含所有文本，除了第一个数字。

我正在使用 C#。

什么模式可以给我我想要的结果？

谢谢！

Answer 1

嗯，想想就很有趣。

\d+：匹配一位或多位数字（诗句编号）。
[\s\S]*?：匹配任何字符，包括换行符，但尽可能少。
(?=( \d+)|$)：正向前视，确保匹配在下一个编号的诗句开头或文本结尾处停止。

using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string text = @"1 This is the genealogy of Jesus the Messiah the son of David, the son of Abraham:
2 Abraham was the father of Isaac,
Isaac the father of Jacob,
Jacob the father of Judah and his brothers,
3 Judah the father of Perez and Zerah, whose mother was Tamar,
Perez the father of Hezron,
Hezron the father of Ram,";

        string pattern = @"(\d+[\s\S]*?)(?=(\n\d+)|$)";
        MatchCollection matches = Regex.Matches(text, pattern);

        int groupCount = 1;
        foreach (Match match in matches)
        {
            string verse = Regex.Replace(match.Value, @"\n", " ");
            Console.WriteLine($"Group #{groupCount}:\n\n{verse}\n");
            groupCount++;
        }
    }
}

匹配字符串中数字之间的所有文本的模式

问题描述投票：0回答：1

1个回答

最新问题

匹配字符串中数字之间的所有文本的模式

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1