PHP 计数并从文本文件中分离文本

Question

假设文本文件里面是这个信息：

<Amanda> Hi there, how are you?
<Jack> Hi, im fine 
.
.
.
.
<Jack> see you later

我想统计每个用户说的单词输出应该是这样的

Amanda: 50
Jack: 40

首先我不想计算

<Amanda>

或

<Jack>

然后我想计算他们说的每一个词并将其插入变量 Amanda 和 Jack

这就是我所做的

    $usercount1 = 0;
    $usercount2 = 0;  

    //Opens a file in read mode  
    $file = fopen("logfile.txt", "r");  
    //Gets each line till end of file is reached  
    while (($line = fgets($file)) !== false) {  
        //Splits each line into words
        $words = explode(" ", $line);  
        $words = explode("<Amanda>", $line);  
        //Counts each word  
        $usercount1 = $usercount1 + count($words);  
    }

    while (($line = fgets($file)) !== false) {  
        //Splits each line into words  
        $words = explode(" ", $line);
        //Counts each word  
        $usercount2 = $usercount2 + count($words);  
    }

Answer 1

根据我的理解，这可能是一个可能的解决方案。


// Input
$input = "<Amanda> Hi there, how are you?\n<Jack> Hi, im fine \n <Jack> see you later";

// Initialize counters
$amandaCount = 0;
$jackCount = 0;

// Split input by lines
$lines = explode("\n", $input);

// Loop over lines
foreach ($lines as $line) {
  // Remove user tags
  $cleanLine = preg_replace("/<.+?>/", "", $line);
  
  // Split line into words
  $words = str_word_count($cleanLine, 1);
  
  // Count words per user
  if (strpos($line, "<Amanda>") !== false) {
    $amandaCount += count($words);
  } elseif (strpos($line, "<Jack>") !== false) {
    $jackCount += count($words);
  }
}

// Output
echo "Amanda: $amandaCount\n";
echo "Jack: $jackCount\n";

Answer 2

我会采用更通用的方法。这样您就可以分析所有用户。使用黑名单，将它们排除在外。

首先遍历所有行并匹配用户名和文本。
通过使用黑名单迭代和计数来重建数据结构。

黑名单格式是这样的，因为找key比找value快

$input = <<<'_TEXT'
<Amanda> Hi there, how are you?
<Jack> Hi, im fine
<Jack> see you later
<John> Hello World, my friends!
<Daniel> Foo!
_TEXT;
preg_match_all('/^<([^>]+)>(.*?)$/m', $input, $matches);

$blacklist = ['Amanda' => 1, 'Jack' => 1];
$words = [];
foreach ($matches[2] as $index => $match) {
    $user = $matches[1][$index];
    if (isset($blacklist[$user])) {
        continue;
    }
    $words[$user] = ($words[$user] ?? 0) + str_word_count($match);
}
print_r($words);

Array
(
    [John] => 4
    [Daniel] => 1
)

Answer 3

我会在正则表达式中实施列入黑名单的名称，以尽早过滤掉它们。

否定的前瞻确保 Amanda 和 Jack 被排除在外。

(?!Amanda>|Jack>)

模式修饰符将

（“字符串开始”锚点）的含义更改为“行开始”锚点。

名称子模式周围的括号将创建捕获组 1（可作为元素

[1]

访问）。

\K

将重新启动全字符串匹配，因此可以通过

[0]

.

访问空格分隔的单词子字符串

在

foreach()

中使用解构语法来获取方便的变量。

代码：（演示）

preg_match_all(
    '/^<((?!Amanda>|Jack>)[^>]+)> \K.+/m',
    $chat,
    $matches,
    PREG_SET_ORDER
);
$result = [];
foreach ($matches as [$words, $name]) {
    $result[$name] = ($result[$name] ?? 0) + str_word_count($words);
}
var_export($result);

PHP 计数并从文本文件中分离文本

问题描述投票：0回答：3

3个回答

最新问题

PHP 计数并从文本文件中分离文本

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3