我正在尝试解析 csv 文件。但是当尝试解析以下行时,我遇到了转义逗号的问题。
<?php
$str = "19018216307,Public,\,k]'=system1-system2,20230914143505.5,1-050000,No";
$data = str_getcsv($str);
?>
输出:
<?php
Array
(
[0] => 19018216307
[1] => Public
[2] => \
[3] => k]'=system1-system2
[4] => 20230914143505.5
[5] => 1-050000
[6] => No
)
?>
让我们考虑列值 \,k]'=system1-system2。预计会被解析为 ,k]'=system1-system2。但是在处理 CSV 文件时,PHP 将其视为 2 列,结果类似于 \ 和 k]'=@system1-system2。
预期输出:
<?php
Array
(
[0] => 19018216307
[1] => Public
[2] => ,k]'=system1-system2
[3] => 20230914143505.5
[4] => 1-050000
[5] => No
);
?>
NOET:CSV 文件是外部网站生成的原始数据。所以我无法对 csv 文件内容做任何事情。 (例如:将列值放在双引号中)
提前致谢!
解决奇怪的“csv 格式”的方法:
$str = "19018216307,Public,\,k]'=system1-system2,20230914143505.5,1-050000,No";
$pattern = <<<'REGEX'
~(?nxx)
(?# modifiers:
- inline n: parenthesis act as non-capturing groups
- inline xx: spaces are ignored even in character classes
- global A: all the matches have to be contiguous
)
# pattern
( (?!\A) , \K | \A ) # not at the start with a commas or at the start without
[^ , \\ ]* ( \\ . [^ , \\ ]* )* # field content (all that isn't a comma, or escaped comma)
# check
( \z (*:END) )? # define a marker if the end of the string is reached
~A
REGEX;
if (preg_match_all($pattern, $str, $m) && isset($m['MARK'])) {
$result = array_map(fn($s) => strtr($s, ['\\,' => ',']), $m[0]);
print_r($result);
}