假设文本文件里面是这个信息:
<Amanda> Hi there, how are you?
<Jack> Hi, im fine
.
.
.
.
<Jack> see you later
我想统计每个用户说的单词输出应该是这样的
Amanda: 50
Jack: 40
首先我不想计算
<Amanda>
或 <Jack>
然后我想计算他们说的每一个词并将其插入变量 Amanda 和 Jack
这就是我所做的
$usercount1 = 0;
$usercount2 = 0;
//Opens a file in read mode
$file = fopen("logfile.txt", "r");
//Gets each line till end of file is reached
while (($line = fgets($file)) !== false) {
//Splits each line into words
$words = explode(" ", $line);
$words = explode("<Amanda>", $line);
//Counts each word
$usercount1 = $usercount1 + count($words);
}
while (($line = fgets($file)) !== false) {
//Splits each line into words
$words = explode(" ", $line);
//Counts each word
$usercount2 = $usercount2 + count($words);
}
根据我的理解,这可能是一个可能的解决方案。
// Input
$input = "<Amanda> Hi there, how are you?\n<Jack> Hi, im fine \n <Jack> see you later";
// Initialize counters
$amandaCount = 0;
$jackCount = 0;
// Split input by lines
$lines = explode("\n", $input);
// Loop over lines
foreach ($lines as $line) {
// Remove user tags
$cleanLine = preg_replace("/<.+?>/", "", $line);
// Split line into words
$words = str_word_count($cleanLine, 1);
// Count words per user
if (strpos($line, "<Amanda>") !== false) {
$amandaCount += count($words);
} elseif (strpos($line, "<Jack>") !== false) {
$jackCount += count($words);
}
}
// Output
echo "Amanda: $amandaCount\n";
echo "Jack: $jackCount\n";
我会采用更通用的方法。这样您就可以分析所有用户。使用黑名单,将它们排除在外。
黑名单格式是这样的,因为找key比找value快
$input = <<<'_TEXT'
<Amanda> Hi there, how are you?
<Jack> Hi, im fine
<Jack> see you later
<John> Hello World, my friends!
<Daniel> Foo!
_TEXT;
preg_match_all('/^<([^>]+)>(.*?)$/m', $input, $matches);
$blacklist = ['Amanda' => 1, 'Jack' => 1];
$words = [];
foreach ($matches[2] as $index => $match) {
$user = $matches[1][$index];
if (isset($blacklist[$user])) {
continue;
}
$words[$user] = ($words[$user] ?? 0) + str_word_count($match);
}
print_r($words);
Array
(
[John] => 4
[Daniel] => 1
)
我会在正则表达式中实施列入黑名单的名称,以尽早过滤掉它们。
否定的前瞻确保 Amanda 和 Jack 被排除在外。
(?!Amanda>|Jack>)
m
模式修饰符将 ^
(“字符串开始”锚点)的含义更改为“行开始”锚点。
名称子模式周围的括号将创建捕获组 1(可作为元素
[1]
访问)。 \K
将重新启动全字符串匹配,因此可以通过 [0]
. 访问空格分隔的单词子字符串
在
foreach()
中使用解构语法来获取方便的变量。
代码:(演示)
preg_match_all(
'/^<((?!Amanda>|Jack>)[^>]+)> \K.+/m',
$chat,
$matches,
PREG_SET_ORDER
);
$result = [];
foreach ($matches as [$words, $name]) {
$result[$name] = ($result[$name] ?? 0) + str_word_count($words);
}
var_export($result);