Perl + Curses:期望从getchar()获得UTF-8编码的多字节字符,但什么也没得到

问题描述 投票:2回答:1

我正在尝试将Bryan Henderson的Perl接口连接到ncurses库:Curses

为了简单练习,我尝试获取在屏幕上键入的单个字符。这是直接基于NCURSES Programming HOWTO进行修改的。

[当我调用Perl库的getchar()时,我希望收到一个字符,可能是多字节的字符(如this part of the library manpage中所述,这有点复杂,因为必须处理功能键的特殊情况,而无需输入,但这是只是通常的冰壶)。

这是下面代码中的子例程read1ch()

这对ASCII字符有效,但对0x7F以上的字符无效。例如,当击中è(Unicode 0x00E8,UTF-8:0xC3,0xA8)时,实际上我得到的是代码0xE8,而不是UTF-8编码的代码。将其打印到LANG=en_GB.UTF-8无法正常工作的终端上,无论如何我都期待0xC3A8。

我需要进行哪些更改才能使其正常工作,即以适当的字符或Perl字符串的形式获取è

getchar()的C代码为here btw。也许只是没有使用C_GET_WCH设置进行编译?如何找出?

附录

附录1

尝试使用binmode设置

binmode STDERR, ':encoding(UTF-8)';
binmode STDOUT, ':encoding(UTF-8)';

这应该解决任何编码问题,因为终端需要并发送UTF-8,但这没有帮助。

也尝试用use open设置流编码(对此和上面的方法之间的区别不太清楚,但这也无济于事]

use open qw(:std :encoding(UTF-8));

附录2

Perl Curses shim的手册页上说:

如果wget_wch()不可用(即Curses库不理解宽字符),这会调用wgetch() [获取1个字节的字符从curses窗口],但返回尽管如此,上述值。这可能是个问题,因为多字节字符编码(如UTF-8),您将收到两个两字节字符的一个字符字符串(例如,“Ô和“¤”表示“一个”)。

这里可能是这种情况,但此系统上确实存在wget_wch()

附录3

[试图查看C代码的作用,然后将fprintf直接添加到curses/Curses-1.36/CursesFunWide.c的多字节处理代码中,重新编译后,没有设法通过Curses.so用我自己的系统覆盖LD_LIBRARY_PATH(为什么不?为什么一切都只能在一半时间内工作?),因此直接替换了系统库(使用THAT!)。

#ifdef C_GET_WCH
    wint_t wch;
    int ret = wget_wch(win, &wch);
    if (ret == OK) {
        ST(0) = sv_newmortal();
        c_wchar2sv(ST(0), wch);
        fprintf(stderr,"Obtained win_t 0x%04lx\n", wch);
        XSRETURN(1);
    } else if (ret == KEY_CODE_YES) {
        XST_mUNDEF(0);
        ST(1) = sv_newmortal();
        sv_setiv(ST(1), (IV)wch);
        XSRETURN(2);
    } else {
        XSRETURN_UNDEF;
    }
#else

这只是一个胖的NOPE,按下ü时会看到:

Obtained win_t 0x00fc

因此,将运行[[正确的代码],但是数据是ISO-8859-1,而不是UTF-8。因此,wget_wch的行为不佳。因此,这是一个curses配置问题。呵呵。

附录4

[令我震惊的是,ncurses可能采用的是默认语言环境,即C。为了使ncurses使用宽字符,必须“初始化语言环境”,这可能意味着将状态从“未设置”(从而使ncurses退回到C)移动到“设置为系统所用指示”(应该是LANG环境变量中的内容)。 ncurses的手册页说:

该库使用调用程序已初始化的语言环境。通常用setlocale完成:

setlocale(LC_ALL,“”);

如果未初始化语言环境,则库将假定字符可以按照ISO-8859-1进行打印,以与某些旧版程序配合使用。您应该初始化语言环境,而不要依赖于未设置语言环境的库。

这也没有用,但我认为解决方案正朝那条路走。

使用perl-Curses的代码

    尝试过perl-Curses-1.36-9.fc30.x86_64
  • 尝试过perl-Curses-1.36-11.fc31.x86_64

  • 如果尝试,请按BACKSPACE退出循环,因为不再解释CTRL-C。

    下面有很多代码,但是关键区域用----- Testing标记:

    #!/usr/bin/perl # pmap -p PID # shows the per process using # /usr/lib64/libncursesw.so.6.1 # /usr/lib64/perl5/vendor_perl/auto/Curses/Curses.so # Trying https://metacpan.org/release/Curses use warnings; use strict; use utf8; # Meaning "This lexical scope (i.e. file) contains utf8" use Curses; # On Fedora: dnf install perl-Curses # This didn't fix it # https://perldoc.perl.org/open.html use open qw(:std :encoding(UTF-8)); # https://perldoc.perl.org/perllocale.html#The-setlocale-function use POSIX (); my $loc = POSIX::setlocale(&POSIX::LC_ALL, ""); # --- # Surrounds the actual program # --- sub setup() { initscr(); raw(); keypad(1); noecho(); } sub teardown { endwin(); } # --- # Mainly for prettyprinting # --- my $special_keys = setup_special_keys(); # --- # Error printing # --- sub mt { return sprintf("%i: ",time()); } sub ae { my ($x,$fname) = @_; if ($x == ERR) { printw mt(); printw "Got error code from '$fname': $x\n" } } # --- # Where the action is # --- sub announce { my $res = printw "Type any character to see it in bold! (or backspace to exit)\n"; ae($res, "printw"); return { refresh => 1 } } sub read1ch { # Read a next character, waiting until it is there. # Use the wide-character aware functions unless you want to deal with # collating individual bytes yourself! # Readings: # https://metacpan.org/pod/Curses#Wide-Character-Aware-Functions # https://perldoc.perl.org/perlunicode.html#Unicode-Character-Properties # https://www.ahinea.com/en/tech/perl-unicode-struggle.html # https://hexdump.wordpress.com/2009/06/19/character-encoding-issues-part-ii-perl/ my ($ch, $key) = getchar(); if (defined $key) { # it's a function key printw "Function key pressed: $key"; printw " with known alias '" . $$special_keys{$key} . "'" if (exists $$special_keys{$key}); printw "\n"; # done if backspace was hit return { done => ($key == KEY_BACKSPACE()) } } elsif (defined $ch) { # "$ch" should be a String of 1 character # ----- Testing printw "Locale: $loc\n"; printw "Multibyte output test: öüäéèà периоду\n"; printw sprintf("Received string '%s' of length %i with ordinal 0x%x\n", $ch, length($ch), ord($ch)); { # https://perldoc.perl.org/bytes.html use bytes; printw sprintf("... length is %i\n" , length($ch)); printw sprintf("... contents are %vd\n" , $ch); } # ----- Testing return { ch => $ch } } else { # it's an error printw "getchar() failed\n"; return {} } } sub feedback { my ($ch) = @_; printw "The pressed key is: "; attron(A_BOLD); printw("%s\n","$ch"); # do not print $txt directly to make sure escape sequences are not interpreted! attroff(A_BOLD); return { refresh => 1 } # should refresh } sub do_curses_run { setup; my $done = 0; while (!$done) { my $bubl; $bubl = announce(); refresh() if $$bubl{refresh}; $bubl = read1ch(); $done = $$bubl{done}; if (defined $$bubl{ch}) { $bubl = feedback($$bubl{ch}); refresh() if $$bubl{refresh}; } } teardown; } # --- # main # --- do_curses_run(); sub setup_special_keys { # the key codes on the left must be called once to resolve to a numeric constant! my $res = { KEY_BREAK() => "Break key", KEY_DOWN() => "Arrow down", KEY_UP() => "Arrow up", KEY_LEFT() => "Arrow left", KEY_RIGHT() => "Arrow right", KEY_HOME() => "Home key", KEY_BACKSPACE() => "Backspace", KEY_DL() => "Delete line", KEY_IL() => "Insert line", KEY_DC() => "Delete character", KEY_IC() => "Insert char or enter insert mode", KEY_EIC() => "Exit insert char mode", KEY_CLEAR() => "Clear screen", KEY_EOS() => "Clear to end of screen", KEY_EOL() => "Clear to end of line", KEY_SF() => "Scroll 1 line forward", KEY_SR() => "Scroll 1 line backward (reverse)", KEY_NPAGE() => "Next page", KEY_PPAGE() => "Previous page", KEY_STAB() => "Set tab", KEY_CTAB() => "Clear tab", KEY_CATAB() => "Clear all tabs", KEY_ENTER() => "Enter or send", KEY_SRESET() => "Soft (partial) reset", KEY_RESET() => "Reset or hard reset", KEY_PRINT() => "Print or copy", KEY_LL() => "Home down or bottom (lower left)", KEY_A1() => "Upper left of keypad", KEY_A3() => "Upper right of keypad", KEY_B2() => "Center of keypad", KEY_C1() => "Lower left of keypad", KEY_C3 () => "Lower right of keypad", KEY_BTAB() => "Back tab key", KEY_BEG() => "Beg(inning) key", KEY_CANCEL() => "Cancel key", KEY_CLOSE() => "Close key", KEY_COMMAND() => "Cmd (command) key", KEY_COPY() => "Copy key", KEY_CREATE() => "Create key", KEY_END() => "End key", KEY_EXIT() => "Exit key", KEY_FIND() => "Find key", KEY_HELP() => "Help key", KEY_MARK() => "Mark key", KEY_MESSAGE() => "Message key", KEY_MOUSE() => "Mouse event read", KEY_MOVE() => "Move key", KEY_NEXT() => "Next object key", KEY_OPEN() => "Open key", KEY_OPTIONS() => "Options key", KEY_PREVIOUS() => "Previous object key", KEY_REDO() => "Redo key", KEY_REFERENCE() => "Ref(erence) key", KEY_REFRESH() => "Refresh key", KEY_REPLACE() => "Replace key", KEY_RESIZE() => "Screen resized", KEY_RESTART() => "Restart key", KEY_RESUME() => "Resume key", KEY_SAVE() => "Save key", KEY_SBEG() => "Shifted beginning key", KEY_SCANCEL() => "Shifted cancel key", KEY_SCOMMAND() => "Shifted command key", KEY_SCOPY() => "Shifted copy key", KEY_SCREATE() => "Shifted create key", KEY_SDC() => "Shifted delete char key", KEY_SDL() => "Shifted delete line key", KEY_SELECT() => "Select key", KEY_SEND() => "Shifted end key", KEY_SEOL() => "Shifted clear line key", KEY_SEXIT() => "Shifted exit key", KEY_SFIND() => "Shifted find key", KEY_SHELP() => "Shifted help key", KEY_SHOME() => "Shifted home key", KEY_SIC() => "Shifted input key", KEY_SLEFT() => "Shifted left arrow key", KEY_SMESSAGE() => "Shifted message key", KEY_SMOVE() => "Shifted move key", KEY_SNEXT() => "Shifted next key", KEY_SOPTIONS() => "Shifted options key", KEY_SPREVIOUS() => "Shifted prev key", KEY_SPRINT() => "Shifted print key", KEY_SREDO() => "Shifted redo key", KEY_SREPLACE() => "Shifted replace key", KEY_SRIGHT() => "Shifted right arrow", KEY_SRSUME() => "Shifted resume key", KEY_SSAVE() => "Shifted save key", KEY_SSUSPEND() => "Shifted suspend key", KEY_SUNDO() => "Shifted undo key", KEY_SUSPEND() => "Suspend key", KEY_UNDO() => "Undo key" }; for (my $f = 1; $f <= 64; $f++) { $$res{KEY_F($f)} = "KEY_F($f)" } return $res }

  • perl encoding utf-8 locale ncurses
    1个回答
    1
    投票
    实际上看起来是正确的。

    使用

    strace运行脚本可以帮助...我这样做是为了查看系统调用:

    strace -fo strace.out -s 1024 ./foo
    并且可以看到读取内容,消息等。可以使用调试库对ncurses进行类似的跟踪,尽管打包程序对于提供启用跟踪的方法并不一致。UTF-8中的

    [ü\303\274(八进制),其Unicode值为252(十进制)或0xfc(十六进制)。问题的这一部分似乎错过了这一点:

    这只是一个胖的NOPE,按ü会看到:

    Obtained win_t 0x00fc

    因此将运行正确的代码,但是数据是ISO-8859-1,而不是UTF-8。所以是wget_wch表现不佳。因此,这是一个curses配置问题。呵呵。

    wget_wch返回(出于实际目的)一个Unicode值(不是UTF-8字节序列)。 ISO-8859-1代码160-255碰巧(并非巧合地)匹配Unicode代码点,尽管后者在中肯定是不同的[[encoded

    [wget_wch将返回UTF-8字节,但是Perl脚本仅将其用作后备(因为这将导致Perl脚本将UTF-8字符串转换为Unicode值)。

    © www.soinside.com 2019 - 2024. All rights reserved.