从文本中获取以关键字数组之一开头的子字符串,并且子字符串不得包含第二个关键字

问题描述 投票:0回答:2

我想编写一些接受两个参数

$text
$keys
的函数。键是带有键的数组。

在输出中,我们需要获取一个数组,其中键将是传递给函数的键(如果我们在文本中找到它们),值将是该键后面的文本,直到遇到它下一个键或文本结束。如果文本中的键重复,则仅将最后一个值写入数组

例如:

可视化文本:Lorem Ipsum 只是印刷和two排版行业的one虚拟文本。自 Three 1500 年代以来,Lorem Ipsum 一直是业界的one 标准虚拟文本。

$text = 'Lorem Ipsum is simply one dummy text of the printing and  two typesetting industry. Lorem Ipsum has been the industry's one standard dummy text ever since the three 1500s.';

$keys = ['one', 'two', 'three'];

所需输出:

[
    'one' => 'standard dummy text ever since the',
    'two' => 'typesetting industry. Lorem Ipsum has been the industry's',
    'three' => '1500s.'
]

我尝试编写一个正则表达式来处理此任务,但没有成功。

最后一次尝试:

function getKeyedSections($text, $keys) {
    $keysArray = explode(',', $keys);
    $pattern = '/(?:' . implode('|', array_map('preg_quote', $keysArray)) . '):\s*(.*?)(?=\s*(?:' . implode('|', array_map('preg_quote', $keysArray)) . '):\s*|\z)/s';
    preg_match_all($pattern, $text, $matches);

    $keyedSections = [];
    foreach ($keysArray as $key) {
        foreach ($matches[1] as $index => $value) {
            if (stripos($matches[0][$index], $key) !== false) {
                $keyedSections[trim($key)] = trim($value);
                break;
            }
        }
    }

    return $keyedSections;
}
php arrays string preg-match-all text-extraction
2个回答
0
投票

由于任何段的末尾都可以由任何搜索字符串 (

$keys
) 标记,因此直接
preg_match()
模式可能有点太嘈杂(但并非不可能)。

也许只需在每个

$keys
值上拆分字符串,然后迭代这些段并推送符合条件的段。

代码:(演示)(或不带

rtrim()

$text = "Lorem Ipsum is simply one dummy text of the printing and  two typesetting industry. Lorem Ipsum has been the industry's one standard dummy text ever since the three 1500s.";

$keys = ['one', 'two', 'three'];
$segments = preg_split('#\b(?=' . implode('|', array_map('preg_quote', $keys)) . ')\b#', $text);
foreach ($segments as $segment) {
    foreach ($keys as $key) {
        if (str_starts_with($segment, $key)) {
            $result[$key] = rtrim($segment);
            break;
        }
    }
}
var_export($result);

我想指出的是,上述脚本的结果不包含不匹配的搜索字符串——您没有说明该场景的结果应该是什么样子。


这是使用

preg_match_all()
的替代方案,它提取以任何键开头并在任何键之前结束的所有片段。无主体
foreach()
只是丢弃较早的匹配以进行后续的匹配,并设置所需的关联结果。 (演示)

$escaped = implode('|', array_map('preg_quote', $keys));

preg_match_all('#\s*\K\b(' . $escaped . ')\b.*?(?=\s*(?:$|\b(?:' . $escaped . ')\b))#', $text, $m, PREG_SET_ORDER);

foreach ($m as [1 => $key, 0 => $result[$key]]);

var_export($result ?? []);

-1
投票

需要交钥匙吗?这个如何将键附加在文本中出现的位置:

<?php 


$text = "Lorem Ipsum is simply **one** dummy text of the printing and  **two** typesetting industry. Lorem Ipsum has been the industry's  **one** standard dummy text ever since the **three** 1500s.";

$matches = [];
preg_match_all("/(\*\*(\w|\d)+\*\*)(\w|\d|\s)+/", $text, $matches);

$actualMatches = $matches[0];
$keys = $matches[1];
$index = 0;

$results = array_reduce($actualMatches, function($carry, $item) use ($keys, &$index) {
    $key = $keys[$index];
    $carry[str_replace("*", "", $key)] = trim(substr($item, strlen($key)));
    $index++;
    return $carry;
}, []);

var_dump($results);

?>

如果您只需要特定的按键,这里有一个替代方案:

<?php 


$text = "Lorem Ipsum is simply **one** dummy text of the printing and  **two** typesetting industry. Lorem Ipsum has been the industry's  **one** standard dummy text ever since the **three** 1500s.";

$matches = [];
preg_match_all("/(\*\*(\w|\d)+\*\*)(\w|\d|\s)+/", $text, $matches);

$actualMatches = $matches[0];
$keys = $matches[1];
$index = 0;

$targetKeys = ['one', 'three'];
$results = array_reduce($actualMatches, function($carry, $item) use ($keys, &$index, $targetKeys) {
    $key = $keys[$index];
    $cleanedKey = str_replace("*", "", $key);
    if (in_array($cleanedKey, $targetKeys)) {
        $carry[str_replace("*", "", $key)] = trim(substr($item, strlen($key)));
    }
    $index++;
    return $carry;
}, []);



var_dump($results);
© www.soinside.com 2019 - 2024. All rights reserved.