我对这个从电子商务网页中的脚本生成的 RAW JSON 有疑问。
我需要解析它并提取有关产品的信息
相同的代码适用于网站的其他页面,但某些产品会产生错误。
这是有问题的 JSON 之一:
"\n\tvar products = [];\n\n\tvar _getAvailabilityText = function(available) {\n\t\tif(available != 'outOfStock')\n\t\t\treturn 'si';\n\t\telse\n\t\t\treturn 'no';\n\t};\n\n\tvar _getAvailabilityBinary = function(available) {\n\t\tif(available != 'outOfStock')\n\t\t\treturn 1;\n\t\telse\n\t\t\treturn 0;\n\t};\n\n\tproducts.push({\n\t\tid \t\t\t: 'IME5401' || '',\n\t\tprice \t\t: '22.99' || '',\n\t\tcurrency \t: 'EUR',\n\t\tname \t\t: 'Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu' || '',\n\t\tcategory \t: 'Riscaldamento' || '',\n\t\tcategoryId\t: 'C7301' || '',\n\t\tgroup\t\t: 'Riscaldamento' || '',\n\t\ttdId\t\t: '',\n\t\tweight\t\t: '',\n\t\tbrand \t\t: 'Imetec' || '',\n\t\tvariant \t: '',\n\t\tdimension55 : _getAvailabilityText('lowStock'),\n\t\tmetric5 \t: _getAvailabilityBinary('lowStock'),\n\t\tdimension63\t: '',\n\t\tmetric12\t: '',\n\t\tdimension10 : 'Piccoli e Grandi Elettrodomestici',\n\t\tdimension11 : 'Trattamento Aria',\n\t\tdimension66 : '',\n\t\tdimension62 : '' || 'no-promo'\n\t});\n\n\twindow.dataLayer.push({\n\t\t'products'\t\t: products\n\t});\n\n\t/*window.dataLayer.push({\n\t\t'event' : 'detail',\n\t\t'ecommerce' : {\n\t\t\t'currencyCode': 'EUR',\n\t\t\t'detail' : {\n\t\t\t\t'products' : products\n\t\t\t}\n\t\t}\n\t});*/\n\n\twindow.dataLayer.push({\n\t\t'event': 'productDetail',\n\t\t'ecommerce' : {\n\t\t\t'currencyCode': 'EUR',\n\t\t\t'detail': {\n\t\t\t\t'products' : [{\n\t\t\t\t\t'name': 'Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu' || '',\n\t\t\t\t\t'id': 'IME5401' || '',\n\t\t\t\t\t'price': '22.99' || '',\n\t\t\t\t\t'brand': 'Imetec' || '',\n\t\t\t\t\t'category': 'Riscaldamento' || '',\n\t\t\t\t}]\n\t\t\t}\n\t\t}\n\t});\n"
这是我的代码:
raw_json_embed <- json_data %>%
str_remove_all("\\n|\\t") %>%
str_extract("(?<=products\\.push\\()(\\{.*?\\})(?=\\);)") %>%
str_replace_all("'", '"') %>%
str_replace_all(' : ', ':')
ex_parsed_json <- jsonlite::parse_json(raw_json_embed)
此时我得到这个错误:
Error: lexical error: invalid char in json text.
{id:"IME5401" || "",price:"22.99
(right here) ------^
我尝试过其他解决方案,例如:
raw_json_embed <- json_data %>%
str_remove_all("\\n|\\t") %>%
str_replace(".*(\\[\\{)", "\\1") %>%
str_replace("(\\}\\]).*", "\\1")
raw_json_embed <- gsub("'", '"', raw_json_embed)
但我仍然得到错误。
如果我将整个 RAW JSON 复制到 JSON 验证器中,它根本没有发现任何问题,我很无能
如评论中所述,该字符串不是 JSON 而是 JavaScript,我猜你在验证器中使用了带引号的字符串。
通过一些技巧,可以在 V8 JS 引擎中评估这个特定示例。尽管请记住,运行随机代码通常是 BadIdea(tm) 并且它可能无法针对您的实际任务进行扩展。
library(V8)
#> Using V8 engine 9.1.269.38
library(dplyr)
ct <- v8()
# v8 does not provide Window, though script only uses window.dataLayer.push()
# and we can easily mock it with our own window object and array in it:
ct$eval("var window = {dataLayer : []};")
# evaluate the js string, script pushes product details to window.dataLayer
ct$eval(js)
#> [1] "2"
# turn our fake window.dataLayer to json string
products_json <- ct$eval("JSON.stringify(window.dataLayer)") %>%
jsonlite::parse_json()
# 2 objects that js script was pushing to window.dataLayer:
products_json[[1]][["products"]][[1]] %>%
as_tibble() %>%
glimpse()
#> Rows: 1
#> Columns: 19
#> $ id <chr> "IME5401"
#> $ price <chr> "22.99"
#> $ currency <chr> "EUR"
#> $ name <chr> "Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu"
#> $ category <chr> "Riscaldamento"
#> $ categoryId <chr> "C7301"
#> $ group <chr> "Riscaldamento"
#> $ tdId <chr> ""
#> $ weight <chr> ""
#> $ brand <chr> "Imetec"
#> $ variant <chr> ""
#> $ dimension55 <chr> "si"
#> $ metric5 <int> 1
#> $ dimension63 <chr> ""
#> $ metric12 <chr> ""
#> $ dimension10 <chr> "Piccoli e Grandi Elettrodomestici"
#> $ dimension11 <chr> "Trattamento Aria"
#> $ dimension66 <chr> ""
#> $ dimension62 <chr> "no-promo"
products_json[[2]][["ecommerce"]][["detail"]][["products"]][[1]] %>%
as_tibble() %>%
glimpse()
#> Rows: 1
#> Columns: 5
#> $ name <chr> "Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu"
#> $ id <chr> "IME5401"
#> $ price <chr> "22.99"
#> $ brand <chr> "Imetec"
#> $ category <chr> "Riscaldamento"
输入js字符串,格式为:
js <- r"(
var products = [];
var _getAvailabilityText = function (available) {
if (available != 'outOfStock')
return 'si';
else
return 'no';
};
var _getAvailabilityBinary = function (available) {
if (available != 'outOfStock')
return 1;
else
return 0;
};
products.push({
id: 'IME5401' || '',
price: '22.99' || '',
currency: 'EUR',
name: 'Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu' || '',
category: 'Riscaldamento' || '',
categoryId: 'C7301' || '',
group: 'Riscaldamento' || '',
tdId: '',
weight: '',
brand: 'Imetec' || '',
variant: '',
dimension55: _getAvailabilityText('lowStock'),
metric5: _getAvailabilityBinary('lowStock'),
dimension63: '',
metric12: '',
dimension10: 'Piccoli e Grandi Elettrodomestici',
dimension11: 'Trattamento Aria',
dimension66: '',
dimension62: '' || 'no-promo'
});
window.dataLayer.push({
'products': products
});
window.dataLayer.push({
'event': 'productDetail',
'ecommerce': {
'currencyCode': 'EUR',
'detail': {
'products': [{
'name': 'Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu' || '',
'id': 'IME5401' || '',
'price': '22.99' || '',
'brand': 'Imetec' || '',
'category': 'Riscaldamento' || '',
}]
}
}
});
)"
创建于 2023-05-10 与 reprex v2.0.2