匹配字符串中数字之间的所有文本的模式

问题描述 投票:0回答:1

我正在尝试使用正则表达式为圣经一章中的每节经文创建一个组。例如,假设文本如下:

1 This is the genealogy of Jesus the Messiah the son of David, the son of Abraham:
2 Abraham was the father of Isaac,
Isaac the father of Jacob,
Jacob the father of Judah and his brothers,
3 Judah the father of Perez and Zerah, whose mother was Tamar,
Perez the father of Hezron,
Hezron the father of Ram,

我寻求的结果是如下3组:

第 1 组:

1 This is the genealogy of Jesus the Messiah the son of David, the son of Abraham:

第二组:

2 Abraham was the father of Isaac,
Isaac the father of Jacob,
Jacob the father of Judah and his brothers,

第 3 组:

3 Judah the father of Perez and Zerah, whose mother was Tamar,
Perez the father of Hezron,
Hezron the father of Ram,

...(额外的好处是用空格替换每组中字符串的任何换行符)

我尝试过很多不同的模式,但都没有成功。我上次尝试的模式是

@"([^0-9][\s\S^0-9]*[^0-9])"
,它产生了两组,每组包含所有文本,除了第一个数字。

我正在使用 C#。 Picture of my code

什么模式可以给我我想要的结果?

谢谢!

c# regex regex-group
1个回答
0
投票

嗯,想想就很有趣。

  • \d+:匹配一位或多位数字(诗句编号)。
  • [\s\S]*?:匹配任何字符,包括换行符,但尽可能少。
  • (?=( \d+)|$):正向前视,确保匹配在下一个编号的诗句开头或文本结尾处停止。
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string text = @"1 This is the genealogy of Jesus the Messiah the son of David, the son of Abraham:
2 Abraham was the father of Isaac,
Isaac the father of Jacob,
Jacob the father of Judah and his brothers,
3 Judah the father of Perez and Zerah, whose mother was Tamar,
Perez the father of Hezron,
Hezron the father of Ram,";

        string pattern = @"(\d+[\s\S]*?)(?=(\n\d+)|$)";
        MatchCollection matches = Regex.Matches(text, pattern);

        int groupCount = 1;
        foreach (Match match in matches)
        {
            string verse = Regex.Replace(match.Value, @"\n", " ");
            Console.WriteLine($"Group #{groupCount}:\n\n{verse}\n");
            groupCount++;
        }
    }
}
© www.soinside.com 2019 - 2024. All rights reserved.