Perl 拆分具有双引号和空格的字符串

问题描述 投票:0回答:4

我有一个像这样的字符串:

"abc" "cd - e"
。我需要将其分成以下两个字符串:

  1. "abc"
  2. "cd - e"

我在 Perl 中尝试了多种选项,但无法满足我需要的选项。有人可以指导我吗?谢谢。

regex string perl parsing
4个回答
2
投票

您可以在前面有

"
后跟
"
的空白处进行分割:

use strict;
use warnings; 

my $s = '"abc" "cd - e"';
my @matches = split /(?<=")\s+(?=")/, $s;
# "abc"
# "cd - e"

2
投票
my @strings = $input =~ /"[^"]*"/g;
  • 假设输入有效。基本上,您可以使用正则表达式匹配来验证或提取,但同时执行这两项操作非常困难。
  • 假设引用的字段不能包含引号,因为您没有提到转义机制。

0
投票

如果您的输入特别有您建议的两个字符串(而不是任意 n 字符串),那么这应该有效:

$s = '"abc" "cd - e"';

$s =~ /(".*") (".*")/;
$s1 = $1;
$s2 = $2;

或者您可以通过将

.
替换为“非引号”来使其更安全,即
[^"]
:

$s =~ /("[^"]*") ("[^"]*")/;
$s1 = $1;
$s2 = $2;

0
投票

这是 split_line 函数的一个不小的实现,用于处理引号和转义空格。特点:

  • 单引号和双引号可以混合使用:{'"tutu truc" blah' "'blah tutu' truc"} => {"tutu truc" blah}{'blah tutu' truc}
  • 引号可以位于单词中间:{h ij'k l' m} => {h}{ijk l}{m}
  • 空格可以在任何地方转义:{\ a b\ c d\ e} => { a}{b c}{d }{e}
  • 引号和空格转义可以混合使用:{'a b'c\ d} => {a bc d}
sub split_line {
  my ($string) = @_;

  my @result = ();
  my @quote = (); # stack of quotes we've seen so far, to handle nested single and double quotes
  my $accumulated = ""; # the token we are building
  my $seen_quotes = 0; # Whether we've seen at least a quote or whether we should pass a chunk of only spaces (to pass separating spaces)

  # eat the string piece by piece, separating a quote, some spaces, or a chunk w/o quotes and spaces. 
  while ($string =~ s/('|"|\s+|[^"'\s]+)(.*)$/$2/) {
      my $token = $1;

  #        print "Seen $token (last quote: ".(scalar @quote>0?$quote[-1]:"none")."; accumulated:$accumulated) -- ";
      if ($token eq "'" || $token eq '"') {
          $seen_quotes = 1;
          if (scalar @quote>0) {
              if ($quote[-1] eq $token) {
                  pop @quote;
              } else {
                  $accumulated .= $token;
              }
          } else {
              push @quote, $token;
          }
      } elsif ($token =~ /\\$/) {
          if (defined(my $next_char = substr($string, 0, 1))) {
             #                print " steal '$next_char' ";
              $token =~ s/.$/$next_char/; 
              $string = substr($string, 1);
          } else {
              die "Error: backslash escape at end of string\n";
          }
          $accumulated .= $token;
      } elsif ($token =~ /^\s*$/) {
          if (scalar @quote>0) {
              $accumulated .= $token;
          }
      } else {
          $accumulated .= $token;
      }
     #        print " (last quote: ".(scalar @quote>0?$quote[-1]:"none")."; accumulated:$accumulated)";

      # Push the string if we're out of quotes, and if the coming chunk is a separating space
      if (scalar @quote == 0  && ($string eq '' || substr($string, 0, 1) eq ' ')) {
          #            print " Push\n";
          # Pass the separating quotes, that are space only and never contained any quote
          if ($accumulated ne '' && $seen_quotes || $accumulated =~ /\S/) {
              push @result, $accumulated;            
          }
          $accumulated = '';
          $seen_quotes = 0;
          #        } else {
          #            print " Pass\n";
      }
  }

  if (scalar @quote > 0) {
      die "Error: unclosed quote at end of string\n";
  }

  return @result;

}

这是一个使用示例:

my $testline = "a  'b c   c d  ' 'titi' \"toto\" ' ' \" \" '\"' \"'\" '\"tutu truc\" blah' \"'blah tutu' truc\" e\\ f  g\\   h ij'k l'  m\"n \" 'l ' "
                ."\\\" \\' '\\'' \"'\\''\" BHA\\RO \\ n \"o p\"q\\ r";
print "$testline\n"; 
print join "#", split_line($testline);
print "#\n";

这将显示以下内容:

a  'b c   c d  ' 'titi' "toto" ' ' " " '"' "'" '"tutu truc" blah' "'blah tutu' truc" e\ f  g\   h ij'k l'  m"n " 'l ' \" \' '\'' "'\''" BHA\RO \ n "o p"q\ r
a#b c   c d  #titi#toto# # #"#'#"tutu truc" blah#'blah tutu' truc#e f#g #h#ijk l#mn #l #"#'#'#'''#BHA\RO# n#o pq r#

这个实现并不完美。以下是一些遗留问题:

  • 引号不能混合使用:>>"da 'd d'<< is invalid
  • 空元素将被忽略:>>\ a<< results in "a" not " a"
  • 引号和空格转义不能混合使用:>>"a b"c\ d<< is invalid

郑重声明,我为我的项目实现了此功能,该项目可用此处。 github 上的代码将来可能会发展以解决问题。

© www.soinside.com 2019 - 2024. All rights reserved.