Perl 拆分具有双引号和空格的字符串

Question

我有一个像这样的字符串：

"abc" "cd - e"

。我需要将其分成以下两个字符串：

```
"abc"
```
```
"cd - e"
```

我在 Perl 中尝试了多种选项，但无法满足我需要的选项。有人可以指导我吗？谢谢。

Answer 1

您可以在前面有

后跟

的空白处进行分割：

use strict;
use warnings; 

my $s = '"abc" "cd - e"';
my @matches = split /(?<=")\s+(?=")/, $s;
# "abc"
# "cd - e"

Answer 2

my @strings = $input =~ /"[^"]*"/g;

假设输入有效。基本上，您可以使用正则表达式匹配来验证或提取，但同时执行这两项操作非常困难。
假设引用的字段不能包含引号，因为您没有提到转义机制。

Answer 3

如果您的输入特别有您建议的两个字符串（而不是任意 n 字符串），那么这应该有效：

$s = '"abc" "cd - e"';

$s =~ /(".*") (".*")/;
$s1 = $1;
$s2 = $2;

或者您可以通过将

替换为“非引号”来使其更安全，即

[^"]

:

$s =~ /("[^"]*") ("[^"]*")/;
$s1 = $1;
$s2 = $2;

Answer 4

这是 split_line 函数的一个不小的实现，用于处理引号和转义空格。特点：

单引号和双引号可以混合使用：{'"tutu truc" blah' "'blah tutu' truc"} => {"tutu truc" blah}{'blah tutu' truc}
引号可以位于单词中间：{h ij'k l' m} => {h}{ijk l}{m}
空格可以在任何地方转义：{\ a b\ c d\ e} => { a}{b c}{d }{e}
引号和空格转义可以混合使用：{'a b'c\ d} => {a bc d}

sub split_line {
  my ($string) = @_;

  my @result = ();
  my @quote = (); # stack of quotes we've seen so far, to handle nested single and double quotes
  my $accumulated = ""; # the token we are building
  my $seen_quotes = 0; # Whether we've seen at least a quote or whether we should pass a chunk of only spaces (to pass separating spaces)

  # eat the string piece by piece, separating a quote, some spaces, or a chunk w/o quotes and spaces. 
  while ($string =~ s/('|"|\s+|[^"'\s]+)(.*)$/$2/) {
      my $token = $1;

  #        print "Seen $token (last quote: ".(scalar @quote>0?$quote[-1]:"none")."; accumulated:$accumulated) -- ";
      if ($token eq "'" || $token eq '"') {
          $seen_quotes = 1;
          if (scalar @quote>0) {
              if ($quote[-1] eq $token) {
                  pop @quote;
              } else {
                  $accumulated .= $token;
              }
          } else {
              push @quote, $token;
          }
      } elsif ($token =~ /\\$/) {
          if (defined(my $next_char = substr($string, 0, 1))) {
             #                print " steal '$next_char' ";
              $token =~ s/.$/$next_char/; 
              $string = substr($string, 1);
          } else {
              die "Error: backslash escape at end of string\n";
          }
          $accumulated .= $token;
      } elsif ($token =~ /^\s*$/) {
          if (scalar @quote>0) {
              $accumulated .= $token;
          }
      } else {
          $accumulated .= $token;
      }
     #        print " (last quote: ".(scalar @quote>0?$quote[-1]:"none")."; accumulated:$accumulated)";

      # Push the string if we're out of quotes, and if the coming chunk is a separating space
      if (scalar @quote == 0  && ($string eq '' || substr($string, 0, 1) eq ' ')) {
          #            print " Push\n";
          # Pass the separating quotes, that are space only and never contained any quote
          if ($accumulated ne '' && $seen_quotes || $accumulated =~ /\S/) {
              push @result, $accumulated;            
          }
          $accumulated = '';
          $seen_quotes = 0;
          #        } else {
          #            print " Pass\n";
      }
  }

  if (scalar @quote > 0) {
      die "Error: unclosed quote at end of string\n";
  }

  return @result;

}

这是一个使用示例：

my $testline = "a  'b c   c d  ' 'titi' \"toto\" ' ' \" \" '\"' \"'\" '\"tutu truc\" blah' \"'blah tutu' truc\" e\\ f  g\\   h ij'k l'  m\"n \" 'l ' "
                ."\\\" \\' '\\'' \"'\\''\" BHA\\RO \\ n \"o p\"q\\ r";
print "$testline\n"; 
print join "#", split_line($testline);
print "#\n";

这将显示以下内容：

a  'b c   c d  ' 'titi' "toto" ' ' " " '"' "'" '"tutu truc" blah' "'blah tutu' truc" e\ f  g\   h ij'k l'  m"n " 'l ' \" \' '\'' "'\''" BHA\RO \ n "o p"q\ r
a#b c   c d  #titi#toto# # #"#'#"tutu truc" blah#'blah tutu' truc#e f#g #h#ijk l#mn #l #"#'#'#'''#BHA\RO# n#o pq r#

这个实现并不完美。以下是一些遗留问题：

引号不能混合使用：>>"da 'd d'<< is invalid
空元素将被忽略：>>\ a<< results in "a" not " a"
引号和空格转义不能混合使用：>>"a b"c\ d<< is invalid

郑重声明，我为我的项目实现了此功能，该项目可用此处。 github 上的代码将来可能会发展以解决问题。

Perl 拆分具有双引号和空格的字符串

问题描述投票：0回答：4

4个回答

最新问题

Perl 拆分具有双引号和空格的字符串

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4