我正在尝试使用正则表达式为圣经一章中的每节经文创建一个组。例如,假设文本如下:
1 This is the genealogy of Jesus the Messiah the son of David, the son of Abraham:
2 Abraham was the father of Isaac,
Isaac the father of Jacob,
Jacob the father of Judah and his brothers,
3 Judah the father of Perez and Zerah, whose mother was Tamar,
Perez the father of Hezron,
Hezron the father of Ram,
我寻求的结果是如下3组:
第 1 组:
1 This is the genealogy of Jesus the Messiah the son of David, the son of Abraham:
第二组:
2 Abraham was the father of Isaac,
Isaac the father of Jacob,
Jacob the father of Judah and his brothers,
第 3 组:
3 Judah the father of Perez and Zerah, whose mother was Tamar,
Perez the father of Hezron,
Hezron the father of Ram,
...(额外的好处是用空格替换每组中字符串的任何换行符)
我尝试过很多不同的模式,但都没有成功。我上次尝试的模式是
@"([^0-9][\s\S^0-9]*[^0-9])"
,它产生了两组,每组包含所有文本,除了第一个数字。
什么模式可以给我我想要的结果?
谢谢!
嗯,想想就很有趣。
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string text = @"1 This is the genealogy of Jesus the Messiah the son of David, the son of Abraham:
2 Abraham was the father of Isaac,
Isaac the father of Jacob,
Jacob the father of Judah and his brothers,
3 Judah the father of Perez and Zerah, whose mother was Tamar,
Perez the father of Hezron,
Hezron the father of Ram,";
string pattern = @"(\d+[\s\S]*?)(?=(\n\d+)|$)";
MatchCollection matches = Regex.Matches(text, pattern);
int groupCount = 1;
foreach (Match match in matches)
{
string verse = Regex.Replace(match.Value, @"\n", " ");
Console.WriteLine($"Group #{groupCount}:\n\n{verse}\n");
groupCount++;
}
}
}