我想使用第三方框架从“没有”提取html文档中的JSON字符串。我正在尝试创建iOS框架,我不想在其中使用第三方框架。
示例网址:http://www.nicovideo.jp/watch/sm33786214
在那个html中,有一行:
我需要提取:JSON_String_I_want_to提取并将其转换为JSON对象。
使用第三方框架“Kanna”,它是这样的:
if let doc = Kanna.HTML(html: html, encoding: String.Encoding.utf8) {
if let descNode = doc.css("#js-initial-watch-data[data-api-data]").first {
let dataApiData = descNode["data-api-data"]
if let data = dataApiData?.data(using: .utf8) {
if let json = try? JSON(data: data, options: JSONSerialization.ReadingOptions.mutableContainers) {
我在网上搜索了类似的问题,但无法申请我的案例:(我需要承认我不太遵循正则表达式)
if let html = String(data:data, encoding:.utf8) {
let pattern = "data-api-data=\"(.*?)\".*?>"
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: html, options: [], range: NSMakeRange(0, html.count))
var results: [String] = []
matches.forEach { (match) -> () in
results.append( (html as NSString).substring(with: match.rangeAt(1)) )
}
if let stringJSON = results.first {
let d = stringJSON.data(using: String.Encoding.utf8)
if let json = try? JSONSerialization.jsonObject(with: d!, options: []) as? Any {
// it does not get here...
}
有谁从html中提取并将其转换为JSON?
谢谢。
您的pattern
似乎并不坏,只是HTML Elements的属性值可能正在使用字符实体。
在将String解析为JSON之前,您需要将它们替换为实际字符。
if let html = String(data:data, encoding: .utf8) {
let pattern = "data-api-data=\"([^\"]*)\""
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: html, range: NSRange(0..<html.utf16.count)) //<-USE html.utf16.count, NOT html.count
var results: [String] = []
matches.forEach {match in
let propValue = html[Range(match.range(at: 1), in: html)!]
//### You need to replace character entities into actual characters
.replacingOccurrences(of: """, with: "\"")
.replacingOccurrences(of: "'", with: "'")
.replacingOccurrences(of: ">", with: ">")
.replacingOccurrences(of: "<", with: "<")
.replacingOccurrences(of: "&", with: "&")
results.append(propValue)
}
if let stringJSON = results.first {
let dataJSON = stringJSON.data(using: .utf8)!
do {
let json = try JSONSerialization.jsonObject(with: dataJSON)
print(json)
} catch {
print(error) //You should not ignore errors silently...
}
} else {
print("NO result")
}
}