我有一个像这样的字符串:
"abc" "cd - e"
。我需要将其分成以下两个字符串:
"abc"
"cd - e"
我在 Perl 中尝试了多种选项,但无法满足我需要的选项。有人可以指导我吗?谢谢。
您可以在前面有
"
后跟 "
的空白处进行分割:
use strict;
use warnings;
my $s = '"abc" "cd - e"';
my @matches = split /(?<=")\s+(?=")/, $s;
# "abc"
# "cd - e"
my @strings = $input =~ /"[^"]*"/g;
如果您的输入特别有您建议的两个字符串(而不是任意 n 字符串),那么这应该有效:
$s = '"abc" "cd - e"';
$s =~ /(".*") (".*")/;
$s1 = $1;
$s2 = $2;
或者您可以通过将
.
替换为“非引号”来使其更安全,即 [^"]
:
$s =~ /("[^"]*") ("[^"]*")/;
$s1 = $1;
$s2 = $2;
这是 split_line 函数的一个不小的实现,用于处理引号和转义空格。特点:
sub split_line {
my ($string) = @_;
my @result = ();
my @quote = (); # stack of quotes we've seen so far, to handle nested single and double quotes
my $accumulated = ""; # the token we are building
my $seen_quotes = 0; # Whether we've seen at least a quote or whether we should pass a chunk of only spaces (to pass separating spaces)
# eat the string piece by piece, separating a quote, some spaces, or a chunk w/o quotes and spaces.
while ($string =~ s/('|"|\s+|[^"'\s]+)(.*)$/$2/) {
my $token = $1;
# print "Seen $token (last quote: ".(scalar @quote>0?$quote[-1]:"none")."; accumulated:$accumulated) -- ";
if ($token eq "'" || $token eq '"') {
$seen_quotes = 1;
if (scalar @quote>0) {
if ($quote[-1] eq $token) {
pop @quote;
} else {
$accumulated .= $token;
}
} else {
push @quote, $token;
}
} elsif ($token =~ /\\$/) {
if (defined(my $next_char = substr($string, 0, 1))) {
# print " steal '$next_char' ";
$token =~ s/.$/$next_char/;
$string = substr($string, 1);
} else {
die "Error: backslash escape at end of string\n";
}
$accumulated .= $token;
} elsif ($token =~ /^\s*$/) {
if (scalar @quote>0) {
$accumulated .= $token;
}
} else {
$accumulated .= $token;
}
# print " (last quote: ".(scalar @quote>0?$quote[-1]:"none")."; accumulated:$accumulated)";
# Push the string if we're out of quotes, and if the coming chunk is a separating space
if (scalar @quote == 0 && ($string eq '' || substr($string, 0, 1) eq ' ')) {
# print " Push\n";
# Pass the separating quotes, that are space only and never contained any quote
if ($accumulated ne '' && $seen_quotes || $accumulated =~ /\S/) {
push @result, $accumulated;
}
$accumulated = '';
$seen_quotes = 0;
# } else {
# print " Pass\n";
}
}
if (scalar @quote > 0) {
die "Error: unclosed quote at end of string\n";
}
return @result;
}
这是一个使用示例:
my $testline = "a 'b c c d ' 'titi' \"toto\" ' ' \" \" '\"' \"'\" '\"tutu truc\" blah' \"'blah tutu' truc\" e\\ f g\\ h ij'k l' m\"n \" 'l ' "
."\\\" \\' '\\'' \"'\\''\" BHA\\RO \\ n \"o p\"q\\ r";
print "$testline\n";
print join "#", split_line($testline);
print "#\n";
这将显示以下内容:
a 'b c c d ' 'titi' "toto" ' ' " " '"' "'" '"tutu truc" blah' "'blah tutu' truc" e\ f g\ h ij'k l' m"n " 'l ' \" \' '\'' "'\''" BHA\RO \ n "o p"q\ r
a#b c c d #titi#toto# # #"#'#"tutu truc" blah#'blah tutu' truc#e f#g #h#ijk l#mn #l #"#'#'#'''#BHA\RO# n#o pq r#
这个实现并不完美。以下是一些遗留问题:
郑重声明,我为我的项目实现了此功能,该项目可用此处。 github 上的代码将来可能会发展以解决问题。