自动换行并在计算行长时忽略 ANSI 转义码

问题描述 投票:0回答:2

我正在用 PHP 构建一个 CLI 应用程序,它有一个输出文本的方法:

$out->line('Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur blandit tempus porttitor.');

我将行输出限制在

line()
内的80个字符通过:

public function line(string $text): void
{
  $this->rawLine(wordwrap($text, 80, PHP_EOL));
}

这会跨多行打印输出:

Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Aenean lacinia
bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies vehicula ut id
elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur blandit tempus
porttitor.

现在,我还可以使用 ANSI 转义码设置部分文本的样式:

$out->line('Morbi leo risus, ' . Style::inline('porta ac consectetur', ['color' => 'blue', 'attribute' => 'bold']) . ' ac, vestibulum at eros. Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur blandit tempus porttitor.');

转换成这个:

Morbi leo risus, \x1b[34;1mporta ac consectetur\x1b[39;22m ac, vestibulum at
eros. Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh
ultricies vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur.
Curabitur blandit tempus porttitor.

传递给

line()
时,打印出来是这样的:

Morbi leo risus, porta ac consectetur ac, vestibulum at eros.
Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies
vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur
blandit tempus porttitor.

“porta ac consectetur ac”是蓝色和粗体,但如果你注意到,这条线比以前更短,并且不会在同一个地方中断。

即使这些是非打印字符,

wordwrap()
(和
strlen()
)在适当计算长度方面也存在问题。

第一行原本是76个字符,没有ANSI转义码:

Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Aenean lacinia

但是添加样式后,又变成了97个字符:

Morbi leo risus, \x1b[34;1mporta ac consectetur\x1b[39;22m ac, vestibulum at eros. Aenean lacinia

在应用程序的其他部分,例如表格,我通过使用一种方法来设置列值然后使用一种单独的方法来设置所述列的样式来“解决”这个问题。这样,我就可以可靠地获取长度,还可以以定义的样式输出文本。

我可以同时传递文本的无样式版本和样式版本,但这感觉不对。也没有解决然后准确拆分样式版本的问题。

为了解决

line()
的问题,我考虑过剥离 ANSI 转义码以获得实际长度,然后在需要的地方添加
PHP_EOL
中断,然后将样式重新注入,但这并不像正确的解决方案,而且看起来很复杂——我该怎么做呢?

所以我的问题是:如何根据文本长度可靠地拆分包含 ANSI 转义码的文本?

php string newline word-wrap ansi-colors
2个回答
0
投票

基于我在另一个答案中用来截断文本的方法(将多字节字符串截断为 n 个字符),计算段的长度只需要在计算字符时忽略 ANSI 序列。

为了在文本中有干净的分隔符,下面的代码片段只会用换行符替换空格(它不是为了在连字符上分隔符而设计的)。

代码:(Demo)(Regex101 Demo

function ansiSafeWrapper(string $string, int $max = 80) {
    return preg_replace(
        "~(?=(?:(?:\\\\x1b\[[0-9;]+m)?.){{$max}})(?:(?:\\\\x1b\[[0-9;]+m)?.){0,$max}\K ~u",
        PHP_EOL,
        str_replace(PHP_EOL, ' ', $string)
    );
}

$test = <<<'ANSI'
Morbi leo risus, \x1b[34;1mporta ac consectetur\x1b[39;22m ac, vestibulum at
eros. Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh
ultricies vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur.
Curabitur blandit tempus porttitor.
ANSI;

echo ansiSafeWrapper($test);

实际上,脚本用空格替换了所有换行符,然后在认为适合返回的地方注入新的换行符: 为了清楚起见,我在每行的末尾添加了字符数。

Morbi leo risus, \x1b[34;1mporta ac consectetur\x1b[39;22m ac, vestibulum at eros. Aenean lacinia  (97 char)
bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies vehicula ut id  (80 char)
elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur blandit tempus  (77 char)
porttitor. (10 char)

在没有 ANSI 序列的情况下将以视觉方式呈现为:

Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Aenean lacinia  (76 char)
bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies vehicula ut id  (80 char)
elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur blandit tempus  (77 char)
porttitor. (10 char)

模式分解:

~                                   #starting pattern delimiter
(?=                                 #start of lookahead
   (?:(?:\\\\x1b\[[0-9;]+m)?.){80}  #consume potential whole ansi code before each single character; match 80 (non-ansi) characters
)                                   #end of lookahead
(?:(?:\\\\x1b\[[0-9;]+m)?.){0,80}   #consume potential whole ansi code before each single character; match upto 80 (non-ansi) characters
\K                                  #forget any characters matched this this point, then match a literal space
~                                   #ending pattern delimiter
u                                   #unicode pattern flag for multibyte safety

0
投票

这是输入:

$styledText = "Morbi leo risus, \x1b[34;1mporta ac consectetur\x1b[39;22m ac, vestibulum at eros. Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur blandit tempus porttitor.";

以下方法从样式文本中删除转义码并将副本保存为干净文本。

干净的文本用于根据所需的列宽使用

wordwrap
添加换行符。

它循环遍历样式文本并在 PHP 在干净文本中添加换行符的每个单词后插入换行符。

function wrap(string $styledText) {

  // Strip ANSI escape codes from text
  $cleanText = preg_replace('/\\x1b\[[0-9;]+m/', '', $styledText);

  // Add PHP_EOL to ensure text does not exceed line width
  $cleanWrappedText = wordwrap($cleanText, 80, PHP_EOL . ' ');

  // Split styled $text and newly $wrappedText on each space
  $styledTextArray = explode(' ', $styledText);
  $cleanTextArray = explode(' ', $cleanWrappedText);

  // Fused text will comprise styled text w/ line breaks from clean text
  $fusedText = '';

  // Loop over each segment (likely a word)
  foreach ($styledTextArray as $index => $segment) {

    // Append word (incl. ANSI escape codes)
    $fusedText .= $segment;

    // If word has line break in clean version then
    // end line, add line break, and start another line
    if (str_ends_with($cleanTextArray[$index], PHP_EOL)) {
        $fusedText .= PHP_EOL;
        continue;
    }

    // If word does not have line break in clean version,
    // but there is another word coming, then add space between words
    if (isset($cleanTextArray[$index+1])) {
        $fusedText .= ' ';
    }
  }

  return $fusedText;
}

请注意,这不容易在网络上进行测试,因为转义码仅在通过 CLI 使用时才适当地设置文本样式。

© www.soinside.com 2019 - 2024. All rights reserved.