preg_split on regex line start

问题描述 投票:0回答:1

我正试图格式化以下文件。

[30-05-2013 15:45:54] A A
[26-06-2013 14:44:44] B A
[26-06-2013 14:44:44] C A
[26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so explode('\n') won't work because
I need the complete message
[26-06-2013 14:44:44] E A
[26-06-2013 14:44:44] F A
[26-06-2013 14:44:44] G A

预期的输出。

Array
(
    [0] => [30-05-2013 15:45:54] A A
    [1] => [26-06-2013 14:44:44] B A
    [2] => [26-06-2013 14:44:44] C A
    [3] => [26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so 
            explode('\n') won't work because
            I need the complete message
    [4] => [26-06-2013 14:44:44] E A
    ...
)


基于 如何在preg_split()的结果中包含分割分隔符? 我试着在结果中使用 正视 来坚持时间戳,并得出了 "时间戳"。Regex101:
(?<=\[)(.+)(?<=\])(.+)

在下面的PHP代码中使用。

#!/usr/bin/env php
<?php

    class Chat {

        function __construct() {

            // Read chat file
            $this->f = file_get_contents(__DIR__ . '/testchat.txt');

            // Split on '[\d]'
            $r = "/(?<=\[)(.+)(?<=\])(.+)/";
            $l = preg_split($r, $this->f, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

            var_dump(count($l));
            var_dump($l);
        }
    }
$c = new Chat();

这给我的输出如下。

array(22) {
  [0]=>
  string(1) "["
  [1]=>
  string(20) "30-05-2013 15:45:54]"
  [2]=>
  string(4) " A A"
  [3]=>
  string(2) "
["
  [4]=>
  string(20) "26-06-2013 14:44:44]"
  [5]=>
  string(4) " B A"
  [6]=>
  string(2) "
["
  [7]=>
  string(20) "26-06-2013 14:44:44]"
  [8]=>
  string(4) " C A"
  [9]=>
  string(2) "
["
  [10]=>
  string(20) "26-06-2013 14:43:16]"
  [11]=>
  string(87) " Some lines are so large, they take multiple lines, so explode('\n') won't work because"
  [12]=>
  string(30) "
I need the complete message
["

疑问

  1. 为什么是第一种?[ 被忽略?
  2. 我应该如何更改regex以获得所需的输出?
  3. 为什么有很多空的字符串,而 PREG_SPLIT_NO_EMPTY?
php regex preg-split
1个回答
2
投票

preg_split,您可以使用

'~\R+(?=\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}])~'

搜索引擎演示

详细内容

  • \R+ - 1+换行符
  • (?=\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}]) - 在当前位置的右侧,需要一个正向的展望。
    • \[ - a [ 烧焦
    • \d{2}-\d{2}-\d{4} - 日字型,两位数,连字符,两位数,连字符和两位数。
    • - 空地
    • \d{2}:\d{2}:\d{2}] - 时间型图案,2位数。:,2位数。:,2位数。

PHP演示。

$text = "[30-05-2013 15:45:54] A A
[26-06-2013 14:44:44] B A
[26-06-2013 14:44:44] C A
[26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so explode('\n') won't work because
I need the complete message
[26-06-2013 14:44:44] E A
[26-06-2013 14:44:44] F A
[26-06-2013 14:44:44] G A";

print_r(preg_split('~\R+(?=\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}])~', $text));

输出。

Array
(
    [0] => [30-05-2013 15:45:54] A A
    [1] => [26-06-2013 14:44:44] B A
    [2] => [26-06-2013 14:44:44] C A
    [3] => [26-06-2013 14:43:16] Some lines are so large, they take multiple lines, so explode('
') won't work because
I need the complete message
    [4] => [26-06-2013 14:44:44] E A
    [5] => [26-06-2013 14:44:44] F A
    [6] => [26-06-2013 14:44:44] G A
)

万一你需要得到更多的细节,而不仅仅是分裂,你可以使用一个 相配的 逼近

'~^\[(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})]\s*+(.*?)(?=\s*^\[(?1)]|\z)~ms'

请看 验证码,将其作为

preg_match_all('~^\[(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})]\s*+(.*?)(?=\s*^\[(?1)]|\z)~ms', $text, $matches)

它将匹配

  • ^ - 句首
  • \[(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})] - 日期时间的细节(采集到第1组)。
  • \s*+ - 0个以上的空格(占位)
  • (.*?) - 任何0+的字符,尽可能少的出现,直到第一次出现的
  • (?=\s*^\[(?1)]|\z) - 前瞻性的位置与紧随其后的位置相匹配。
    • \s* - 0+空格
    • ^ - 行首
    • \[(?1)] - [,组1模式。]
    • | -或
    • \z - 弦的最末端。

0
投票

迟到的答案,但你也可以用。

$text =  file_get_contents("testchat.txt");

preg_match_all('/(\[.*?\])([^\[]+)/im', $text, $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[0]); $i++) {
    $date = $matches[1][$i];
    $line = $matches[2][$i];
    print("$date $line");
}
© www.soinside.com 2019 - 2024. All rights reserved.