如何修复词法错误:JSON 文本中的无效字符。尝试从 JSON

问题描述 投票:0回答:1

我对这个从电子商务网页中的脚本生成的 RAW JSON 有疑问。

我需要解析它并提取有关产品的信息

相同的代码适用于网站的其他页面,但某些产品会产生错误。

这是有问题的 JSON 之一:

"\n\tvar products = [];\n\n\tvar _getAvailabilityText = function(available) {\n\t\tif(available != 'outOfStock')\n\t\t\treturn 'si';\n\t\telse\n\t\t\treturn 'no';\n\t};\n\n\tvar _getAvailabilityBinary = function(available) {\n\t\tif(available != 'outOfStock')\n\t\t\treturn 1;\n\t\telse\n\t\t\treturn 0;\n\t};\n\n\tproducts.push({\n\t\tid \t\t\t: 'IME5401' || '',\n\t\tprice \t\t: '22.99' || '',\n\t\tcurrency \t: 'EUR',\n\t\tname \t\t: 'Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu' || '',\n\t\tcategory \t: 'Riscaldamento' || '',\n\t\tcategoryId\t: 'C7301' || '',\n\t\tgroup\t\t: 'Riscaldamento' || '',\n\t\ttdId\t\t: '',\n\t\tweight\t\t: '',\n\t\tbrand \t\t: 'Imetec' || '',\n\t\tvariant \t: '',\n\t\tdimension55 : _getAvailabilityText('lowStock'),\n\t\tmetric5 \t: _getAvailabilityBinary('lowStock'),\n\t\tdimension63\t: '',\n\t\tmetric12\t: '',\n\t\tdimension10 : 'Piccoli e Grandi Elettrodomestici',\n\t\tdimension11 : 'Trattamento Aria',\n\t\tdimension66 : '',\n\t\tdimension62 : '' || 'no-promo'\n\t});\n\n\twindow.dataLayer.push({\n\t\t'products'\t\t: products\n\t});\n\n\t/*window.dataLayer.push({\n\t\t'event' : 'detail',\n\t\t'ecommerce' : {\n\t\t\t'currencyCode': 'EUR',\n\t\t\t'detail' : {\n\t\t\t\t'products' : products\n\t\t\t}\n\t\t}\n\t});*/\n\n\twindow.dataLayer.push({\n\t\t'event': 'productDetail',\n\t\t'ecommerce' : {\n\t\t\t'currencyCode': 'EUR',\n\t\t\t'detail': {\n\t\t\t\t'products' : [{\n\t\t\t\t\t'name': 'Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu' || '',\n\t\t\t\t\t'id': 'IME5401' || '',\n\t\t\t\t\t'price': '22.99' || '',\n\t\t\t\t\t'brand': 'Imetec' || '',\n\t\t\t\t\t'category': 'Riscaldamento' || '',\n\t\t\t\t}]\n\t\t\t}\n\t\t}\n\t});\n"

这是我的代码:

  raw_json_embed <- json_data %>%
    str_remove_all("\\n|\\t") %>%
    str_extract("(?<=products\\.push\\()(\\{.*?\\})(?=\\);)") %>%
    str_replace_all("'", '"') %>%
    str_replace_all(' : ', ':')
ex_parsed_json <- jsonlite::parse_json(raw_json_embed)

此时我得到这个错误:

Error: lexical error: invalid char in json text.
                                      {id:"IME5401" || "",price:"22.99
                     (right here) ------^

我尝试过其他解决方案,例如:

  raw_json_embed <- json_data %>%
    str_remove_all("\\n|\\t") %>%
    str_replace(".*(\\[\\{)", "\\1") %>%
    str_replace("(\\}\\]).*", "\\1")
  
  raw_json_embed <- gsub("'", '"', raw_json_embed)

但我仍然得到错误。

如果我将整个 RAW JSON 复制到 JSON 验证器中,它根本没有发现任何问题,我很无能

r json web-scraping error-handling jsonparser
1个回答
0
投票

如评论中所述,该字符串不是 JSON 而是 JavaScript,我猜你在验证器中使用了带引号的字符串。

通过一些技巧,可以在 V8 JS 引擎中评估这个特定示例。尽管请记住,运行随机代码通常是 BadIdea(tm) 并且它可能无法针对您的实际任务进行扩展。

library(V8)
#> Using V8 engine 9.1.269.38
library(dplyr)

ct <- v8()
# v8 does not provide Window, though script only uses window.dataLayer.push()
# and we can easily mock it with our own window object and array in it:
ct$eval("var window = {dataLayer : []};")
# evaluate the js string, script pushes product details to window.dataLayer 
ct$eval(js)
#> [1] "2"

# turn our fake window.dataLayer to json string
products_json <- ct$eval("JSON.stringify(window.dataLayer)") %>% 
  jsonlite::parse_json()

# 2 objects that js script was pushing to window.dataLayer:
products_json[[1]][["products"]][[1]] %>% 
  as_tibble() %>% 
  glimpse()
#> Rows: 1
#> Columns: 19
#> $ id          <chr> "IME5401"
#> $ price       <chr> "22.99"
#> $ currency    <chr> "EUR"
#> $ name        <chr> "Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu"
#> $ category    <chr> "Riscaldamento"
#> $ categoryId  <chr> "C7301"
#> $ group       <chr> "Riscaldamento"
#> $ tdId        <chr> ""
#> $ weight      <chr> ""
#> $ brand       <chr> "Imetec"
#> $ variant     <chr> ""
#> $ dimension55 <chr> "si"
#> $ metric5     <int> 1
#> $ dimension63 <chr> ""
#> $ metric12    <chr> ""
#> $ dimension10 <chr> "Piccoli e Grandi Elettrodomestici"
#> $ dimension11 <chr> "Trattamento Aria"
#> $ dimension66 <chr> ""
#> $ dimension62 <chr> "no-promo"

products_json[[2]][["ecommerce"]][["detail"]][["products"]][[1]] %>% 
  as_tibble() %>% 
  glimpse()
#> Rows: 1
#> Columns: 5
#> $ name     <chr> "Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu"
#> $ id       <chr> "IME5401"
#> $ price    <chr> "22.99"
#> $ brand    <chr> "Imetec"
#> $ category <chr> "Riscaldamento"

输入js字符串,格式为:

js <- r"(
var products = [];

var _getAvailabilityText = function (available) {
    if (available != 'outOfStock')
        return 'si';
    else
        return 'no';
};

var _getAvailabilityBinary = function (available) {
    if (available != 'outOfStock')
        return 1;
    else
        return 0;
};

products.push({
    id: 'IME5401' || '',
    price: '22.99' || '',
    currency: 'EUR',
    name: 'Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu' || '',
    category: 'Riscaldamento' || '',
    categoryId: 'C7301' || '',
    group: 'Riscaldamento' || '',
    tdId: '',
    weight: '',
    brand: 'Imetec' || '',
    variant: '',
    dimension55: _getAvailabilityText('lowStock'),
    metric5: _getAvailabilityBinary('lowStock'),
    dimension63: '',
    metric12: '',
    dimension10: 'Piccoli e Grandi Elettrodomestici',
    dimension11: 'Trattamento Aria',
    dimension66: '',
    dimension62: '' || 'no-promo'
});

window.dataLayer.push({
    'products': products
});


window.dataLayer.push({
    'event': 'productDetail',
    'ecommerce': {
        'currencyCode': 'EUR',
        'detail': {
            'products': [{
                'name': 'Imetec Living Air umidificatore Vapore 0,4 L 700 W Blu' || '',
                'id': 'IME5401' || '',
                'price': '22.99' || '',
                'brand': 'Imetec' || '',
                'category': 'Riscaldamento' || '',
            }]
        }
    }
});

)"

创建于 2023-05-10 与 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.