JsonParseException:无法识别的标记'http':期待('true','false'或'null')

问题描述 投票:5回答:3

我们有以下字符串,它是写入HDFS上文件的有效JSON。

{  
  "id":"tag:search.twitter.com,2005:564407444843950080",
  "objectType":"activity",
  "actor":{  
    "objectType":"person",
    "id":"id:twitter.com:2302910022",
    "link":"http%3A%2F%2Fwww.twitter.com%2Fme7me4610012",
    "displayName":"",
    "postedTime":"2014-01-21T11:06:06.000Z",
    "image":"https%3A%2F%2Fpbs.twimg.com%2Fprofile_images%2F563125491159162881%2FfypkHK3M_normal.jpeg",
    "summary":"‏‏‏‏‏‏‏‏ضًـأّيِّعٌهّ أّروٌأّحًنِأّ تٌـشُـتٌـهّـيِّ مًنِ يِّفُـهّـمًهّـأّ فُـقُط  حسابي بالإنستقرام lloooo_20",
    "links":[  
      {  
        "href":null,
        "rel":"me"
      }
    ],
    "friendsCount":10503,
    "followersCount":10325,
    "listedCount":12,
    "statusesCount":84957,
    "twitterTimeZone":null,
    "verified":false,
    "utcOffset":null,
    "preferredUsername":"me7me4610012",
    "languages":[  
      "ar"
    ],
    "favoritesCount":17
  },
  "verb":"share",
  "postedTime":"2015-02-08T12:56:35.000Z",
  "generator":{  
    "displayName":"Twitter for Android",
    "link":"http%3A%2F%2Ftwitter.com%2Fdownload%2Fandroid"
  },
  "provider":{  
    "objectType":"service",
    "displayName":"Twitter",
    "link":"http%3A%2F%2Fwww.twitter.com"
  },
  "link":"http%3A%2F%2Ftwitter.com%2Fme7me4610012%2Fstatuses%2F564407444843950080",
  "body":"RT @sckud1: فيديو: إمام يرفض بغضب الصلاة على أحد قتلى حزب الله في سوريا بسبب إطلاق النار: ماعاد  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIln…",
  "object":{  
    "id":"tag:search.twitter.com,2005:564407126526013440",
    "objectType":"activity",
    "actor":{  
      "objectType":"person",
      "id":"id:twitter.com:462268717",
      "link":"http%3A%2F%2Fwww.twitter.com/sckud1",
      "displayName":"صفق الهوى",
      "postedTime":"2012-01-12T19:24:17.000Z",
      "image":"https%3A%2F%2Fpbs.twimg.com%2Fprofile_images%2F508424482885615616%2FmPBGZBPx_normal.jpeg",
      "summary":"اعلانك في سوق الخليج يحقق لك الوصول الى اكثر من مليون متابع خليجي  http%3A%2F%2Fmarketgulf.com",
      "links":[  
        {  
          "href":"http%3A%2F%2Fmarketgulf.com",
          "rel":"me"
        }
      ],
      "friendsCount":435237,
      "followersCount":464951,
      "listedCount":708,
      "statusesCount":1071685,
      "twitterTimeZone":"Riyadh",
      "verified":false,
      "utcOffset":"10800",
      "preferredUsername":"sckud1",
      "languages":[  
        "ar"
      ],
      "location":{  
        "objectType":"place",
        "displayName":"Made in K S A"
      },
      "favoritesCount":77
    },
    "verb":"post",
    "postedTime":"2015-02-08T12:55:19.000Z",
    "generator":{  
      "displayName":"Tweet Old Post",
      "link":"http%3A%2F%2Fwww.ajaymatharu.com%2F"
    },
    "provider":{  
      "objectType":"service",
      "displayName":"Twitter",
      "link":"http%3A%2F%2Fwww.twitter.com"
    },
    "link":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatuses%2F564407126526013440",
    "body":"فيديو: إمام يرفض بغضب الصلاة على أحد قتلى حزب الله في سوريا بسبب إطلاق النار: ماعاد  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
    "object":{  
      "objectType":"note",
      "id":"object:search.twitter.com,2005:564407126526013440",
      "summary":"فيديو: إمام يرفض بغضب الصلاة على أحد قتلى حزب الله في سوريا بسبب إطلاق النار: ماعاد  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
      "link":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatuses%2F564407126526013440",
      "postedTime":"2015-02-08T12:55:19.000Z"
    },
    "favoritesCount":0,
    "twitter_entities":{  
      "hashtags":[  

      ],
      "trends":[  

      ],
      "urls":[  
        {  
          "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
          "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
          "display_url":"hasterya.com/archives/34688…",
          "indices":[  
            85,
            107
          ]
        }
      ],
      "user_mentions":[  

      ],
      "symbols":[  

      ],
      "media":[  
        {  
          "id":564407126341468160,
          "id_str":"564407126341468160",
          "indices":[  
            108,
            130
          ],
          "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
          "display_url":"pic.twitter.com/t5TjIlnZgN",
          "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
          "type":"photo",
          "sizes":{  
            "large":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "thumb":{  
              "w":150,
              "h":150,
              "resize":"crop"
            },
            "small":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "medium":{  
              "w":320,
              "h":180,
              "resize":"fit"
            }
          }
        }
      ]
    },
    "twitter_extended_entities":{  
      "media":[  
        {  
          "id":564407126341468160,
          "id_str":"564407126341468160",
          "indices":[  
            108,
            130
          ],
          "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
          "display_url":"pic.twitter.com/t5TjIlnZgN",
          "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
          "type":"photo",
          "sizes":{  
            "large":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "thumb":{  
              "w":150,
              "h":150,
              "resize":"crop"
            },
            "small":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "medium":{  
              "w":320,
              "h":180,
              "resize":"fit"
            }
          }
        }
      ]
    },
    "twitter_filter_level":"low",
    "twitter_lang":"ar"
  },
  "favoritesCount":0,
  "twitter_entities":{  
    "hashtags":[  

    ],
    "trends":[  

    ],
    "urls":[  
      {  
        "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
        "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
        "display_url":"hasterya.com/archives/34688…",
        "indices":[  
          97,
          119
        ]
      }
    ],
    "user_mentions":[  
      {  
        "screen_name":"sckud1",
        "name":"صفق الهوى",
        "id":462268717,
        "id_str":"462268717",
        "indices":[  
          3,
          10
        ]
      }
    ],
    "symbols":[  

    ],
    "media":[  
      {  
        "id":564407126341468160,
        "id_str":"564407126341468160",
        "indices":[  
          139,
          140
        ],
        "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "display_url":"pic.twitter.com/t5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "type":"photo",
        "sizes":{  
          "large":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "thumb":{  
            "w":150,
            "h":150,
            "resize":"crop"
          },
          "small":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "medium":{  
            "w":320,
            "h":180,
            "resize":"fit"
          }
        },
        "source_status_id":564407126526013440,
        "source_status_id_str":"564407126526013440"
      }
    ]
  },
  "twitter_extended_entities":{  
    "media":[  
      {  
        "id":564407126341468160,
        "id_str":"564407126341468160",
        "indices":[  
          139,
          140
        ],
        "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "display_url":"pic.twitter.com/t5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "type":"photo",
        "sizes":{  
          "large":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "thumb":{  
            "w":150,
            "h":150,
            "resize":"crop"
          },
          "small":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "medium":{  
            "w":320,
            "h":180,
            "resize":"fit"
          }
        },
        "source_status_id":564407126526013440,
        "source_status_id_str":"564407126526013440"
      }
    ]
  },
  "twitter_filter_level":"low",
  "twitter_lang":"ar",
  "retweetCount":1,
  "gnip":{  
    "matching_rules":[  
      {  
        "tag":"ISIS66"
      }
    ],
    "urls":[  
      {  
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "expanded_status":200
      },
      {  
        "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
        "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
        "expanded_status":200
      }
    ],
    "klout_score":50,
    "language":{  
      "value":"ar"
    }
  }
}

编辑

我们配置了一个从该文件读取数据并将其传递给Solr接收器的flume代理,但不幸的是,标题中的这个异常是throw。

这是堆栈跟踪

org.kitesdk.morphline.api.MorphlineRuntimeException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@20d7aa52; line: 1, column: 9]
    at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:98)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:120)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:128)
    at org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:141)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:744)
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@20d7aa52; line: 1, column: 9]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1524)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:557)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3095)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2340)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:818)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:698)
    at com.fasterxml.jackson.databind.MappingIterator.hasNextValue(MappingIterator.java:159)
    at org.kitesdk.morphline.json.ReadJsonBuilder$ReadJson.doProcess(ReadJsonBuilder.java:109)
    at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:96)
    ... 10 more
java jackson cloudera
3个回答
7
投票

我们有以下字符串,它是一个有效的JSON ...

显然,JSON解析器不同意!

但是,异常表示错误位于“第1行:第9列”,并且在JSON开头附近没有“http”令牌。因此,我怀疑解析器在发生错误时尝试解析与此字符串不同的内容。

您需要找到实际正在解析的JSON。在调试器中运行应用程序,在JsonParseException的相关构造函数上设置断点...然后找出它试图解析的ByteArrayInputStream中的内容。


1
投票

我长期面对这个例外,无法确定问题所在。第1行第9列例外。我做的错误是获取文件的第一行,即水槽正在处理。

Apache flume处理补丁中文件的内容。因此,当水槽抛出此异常并说第1行时,它表示当前补丁中的第一行。

如果您的flume代理配置为使用批处理大小= 100,并且(例如)该文件包含400行,则表示在以下行1,01,201,301之一中引发异常。

如何发现导致问题的线?

你有三种方法可以做到这一点。

1-拉动源代码并以调试模式运行代理。如果您是像我这样的普通开发人员,并且不知道如何制作,请查看其他两个选项。

2-尝试根据批量大小拆分文件,然后再次运行flume代理。如果将文件拆分为4个文件,并且行301和400之间存在无效的json,则flume代理将处理前3个文件并停在第4个文件。获取第四个文件,然后再将其拆分为更小的文件。继续该过程,直到您到达只有一行的文件,并且水槽在处理时失败。

3-将水槽代理的批量大小减少到一个,并比较正在使用的接收器输出中的已处理事件数。例如,在我的情况下,我使用Solr接收器。该文件包含400行。水槽剂配置为批量大小= 100。当我运行水槽代理时,它会在某个时刻失败并抛出异常。此时,检查Solr中摄取的文档数量。如果第346行存在无效的json,则索引到Solr的文档数将为345,因此下一行是导致问题的行。

在我的情况下,我遵循第三个选项,幸运的是,我确定导致问题的线。

这是一个很长的答案,但它实际上并没有解决异常。我如何克服这个例外?

我不知道为什么杰克逊库在解析json字符串包含转义字符\n \r \t时会抱怨。我认为(但我不确定)Jackson解析器默认情况下会转义这些字符,这些字符将json字符串拆分为两行(如果是\n),然后它将每一行作为单独的json字符串处理。

在我的情况下,我们使用自定义拦截器来删除这些字符,然后由水槽代理处理。这就是我们解决这个问题的方法。


0
投票

这可能是显而易见的,但要确保您发送到解析器URL对象而不是包含www地址的String。这不起作用:

    ObjectMapper mapper = new ObjectMapper();
    String www = "www.sample.pl";
    Weather weather = mapper.readValue(www, Weather.class);

但这会:

    ObjectMapper mapper = new ObjectMapper();
    URL www = new URL("http://www.oracle.com/");
    Weather weather = mapper.readValue(www, Weather.class);
© www.soinside.com 2019 - 2024. All rights reserved.