从java中的多行文本中提取键值对

Question

考虑以下多行字符串：

This is multiline text that needs to be correctly parsed into key-value pairs, excluding all other information.

 Section One:
    First key = Value One
    Second key = Value Two

 Section Two:   
    Third key = Value Three
    Fourth key = Value Four
    Fifth key = Value Five

 Section Three:
    Sixth key = Value Six
    Seventh key = Value Seven
    Eighth key = Value Eight

换句话说，文本由“引言”（一些短语）组成，后面跟着多行，按部分组织，每个部分都有一个“标题”（例如，Section One）和多个键值对，用=分隔。

键可以包含除新行和=之外的任何字符，并且值可以包含除新行之外的任何字符。

有时，文本中可能会出现其他不相关的行。

需要一个正则表达式，它将导致matched.find()返回所有键值对组，并且只返回那些，跳过引言和节头，以及没有键值对的任何其他行。

理想情况下，不需要其他文本预处理或后处理。

逐行阅读文本并相应地进行处理不是此用例中的选项。

像(?:\r|\n)(\s*[^=\.]+)\s*=\s*(.+)这样的模式接近但它们仍然包含更多的要求。

有任何想法吗？

Answer 1

你快到了。只需将\s*更改为<space>*，因为\s也会匹配换行符。

(?:\r|\n) *([^\n=\.]+)(?<=\S) *= *(.+)

如果它包含制表符，则将上面的space*更改为[ \t]*。 (?<=\S)正面lookbehind，断言匹配必须以非空格字符开头。

DEMO

String s = "This is multiline text that needs to be correctly parsed into key-value pairs, excluding all other information.\n" + 
        "\n" + 
        " Section One:\n" + 
        "    First key = Value One\n" + 
        "    Second key = Value Two\n" + 
        "\n" + 
        " Section Two:   \n" + 
        "    Third key = Value Three\n" + 
        "    Fourth key = Value Four\n" + 
        "    Fifth key = Value Five\n" + 
        "\n" + 
        " Section Three:\n" + 
        "    Sixth key = Value Six\n" + 
        "    Seventh key = Value Seven\n" + 
        "    Eighth key = Value Eight";
Matcher m = Pattern.compile("(?:\\r|\\n)[\\t ]*([^\\n=\\.]+)(?<=\\S)[\\t ]*=[\\t ]*(.+)").matcher(s);
while(m.find())
{
    System.out.println("Key : "+m.group(1) + " => Value : " + m.group(2));
}

输出：

Key : First key => Value : Value One
Key : Second key => Value : Value Two
Key : Third key => Value : Value Three
Key : Fourth key => Value : Value Four
Key : Fifth key => Value : Value Five
Key : Sixth key => Value : Value Six
Key : Seventh key => Value : Value Seven
Key : Eighth key => Value : Value Eight

从java中的多行文本中提取键值对

问题描述投票：1回答：1

1个回答

最新问题

从java中的多行文本中提取键值对

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1