如果我们有以下不良情况:
(Line 1) "1","TEST","{"K1":"V1","K2":"V2"}","Normal Line","Contents"
(Line 2) ,"","{}","TEST LINE","Contents"
(Line 3) "1","TEST","{"K1":"V1","K2":"V2"}","LINE
(Line 4) SPLIT HERE","Contents"
我们如何解决上述情况,如下所示:
(Line 1) "1","TEST","{"K1":"V1","K2":"V2"}","Normal Line","Contents"
(Line 2) ,"","{}","TEST LINE","Contents"
(Line 3) "1","TEST","{"K1":"V1","K2":"V2"}","LINE SPLIT HERE","Contents"
我们在 Google Analytics 的单个文件中有许多行,应该在加载到 Hive 表之前修复。
我们提前感谢您。
BR, 肖恩
使用 GNU sed:
sed '/[^"]$/{N;s/\n//}' file
输出到标准输出:
"1","TEST","{"K1":"V1","K2":"V2"}","普通线","内容" ,"","{}","测试线","内容" "1","测试","{"K1":"V1","K2":"V2"}","在此处分割行","内容"