错误:“期待属性名称包括在双引号:行2列1(焦炭2)”

问题描述 投票:0回答:1

所以我提出这个聊天机器人训练了一个月的reddit的意见。脚本我目前正在上创建一个数据库,并加载它与来自JSON文件的一些数据。

当我运行的代码,但它实际上是管理创造了sqlite3的数据库,但它打印出一个错误:

Expecting property name enclosed in double quotes: line 2 column 1 (char 2) Extra data: line 1 column 16 (char 15) Extra data: line 1 column 8 (char 7) Extra data: line 1 column 11 (char 10) Extra data: line 1 column 8 (char 7) Extra data: line 1 column 9 (char 8) Extra data: line 1 column 15 (char 14) Extra data: line 1 column 9 (char 8) Extra data: line 1 column 10 (char 9) Extra data: line 1 column 17 (char 16) Extra data: line 1 column 6 (char 5) Extra data: line 1 column 12 (char 11) Extra data: line 1 column 13 (char 12) Extra data: line 1 column 13 (char 12) Extra data: line 1 column 26 (char 25) Extra data: line 1 column 21 (char 20) Extra data: line 1 column 10 (char 9) Extra data: line 1 column 16 (char 15) Extra data: line 1 column 7 (char 6) Extra data: line 1 column 20 (char 19) Extra data: line 1 column 16 (char 15) Extra data: line 1 column 10 (char 9) Expecting value: line 1 column 1 (char 0)

任何人都可以告诉我,我能做些什么来解决这个问题?

BTW继承人的全部代码:

import sqlite3
import json
from datetime import datetime
import time
import ast

timeframe = '2015-01'
sql_transaction = []
start_row = 0
cleanup = 1000000

connection = sqlite3.connect('{}.db'.format(timeframe))
c = connection.cursor()


def create_table():
    c.execute("CREATE TABLE IF NOT EXISTS parent_reply(parent_id TEXT PRIMARY KEY, comment_id TEXT UNIQUE, parent TEXT, comment TEXT, subreddit TEXT, unix INT, score INT)")


def format_data(data):
    data = data.replace('\n', ' newlinechar ').replace('\r', ' newlinechar ').replace('"', "'")
    return data


def transaction_bldr(sql):
    global sql_transaction
    sql_transaction.append(sql)
    if len(sql_transaction) > 1000:
        c.execute('BEGIN TRANSACTION')
        for s in sql_transaction:
            try:
                c.execute(s)
            except:
                pass
        connection.commit()
        sql_transaction = []


def sql_insert_replace_comment(commentid, parentid, parent, comment, subreddit, time, score):
    try:
        sql = """UPDATE parent_reply SET parent_id = ?, comment_id = ?, parent = ?, comment = ?, subreddit = ?, unix = ?, score = ? WHERE parent_id =?;""".format(
            parentid, commentid, parent, comment, subreddit, int(time), score, parentid)
        transaction_bldr(sql)
    except Exception as e:
        print('s0 insertion', str(e))


def sql_insert_has_parent(commentid, parentid, parent, comment, subreddit, time, score):
    try:
        sql = """INSERT INTO parent_reply (parent_id, comment_id, parent, comment, subreddit, unix, score) VALUES ("{}","{}","{}","{}","{}",{},{});""".format(
            parentid, commentid, parent, comment, subreddit, int(time), score)
        transaction_bldr(sql)
    except Exception as e:
        print('s0 insertion', str(e))


def sql_insert_no_parent(commentid, parentid, comment, subreddit, time, score):
    try:
        sql = """INSERT INTO parent_reply (parent_id, comment_id, comment, subreddit, unix, score) VALUES ("{}","{}","{}","{}",{},{});""".format(
            parentid, commentid, comment, subreddit, int(time), score)
        transaction_bldr(sql)
    except Exception as e:
        print('s0 insertion', str(e))


def acceptable(data):
    if len(data.split(' ')) > 1000 or len(data) < 1:
        return False
    elif len(data) > 32000:
        return False
    elif data == '[deleted]':
        return False
    elif data == '[removed]':
        return False
    else:
        return True


def find_parent(pid):
    try:
        sql = "SELECT comment FROM parent_reply WHERE comment_id = '{}' LIMIT 1".format(pid)
        c.execute(sql)
        result = c.fetchone()
        if result != None:
            return result[0]
        else:
            return False
    except Exception as e:
        # print(str(e))
        return False


def find_existing_score(pid):
    try:
        sql = "SELECT score FROM parent_reply WHERE parent_id = '{}' LIMIT 1".format(pid)
        c.execute(sql)
        result = c.fetchone()
        if result != None:
            return result[0]
        else:
            return False
    except Exception as e:
        # print(str(e))
        return False


if __name__ == '__main__':
    create_table()
    row_counter = 0
    paired_rows = 0

    with open(r'C:\Users\hermans\Desktop\RedditBot\RC_2015-01.json', buffering=1000) as f:
        for row in f:
            # print(row)
            # time.sleep(555)
            row_counter += 1

            if row_counter > start_row:
                try:
                    row = json.loads(row)
                    parent_id = row['parent_id'].split('_')[1]
                    body = format_data(row['body'])
                    created_utc = row['created_utc']
                    score = row['score']

                    comment_id = row['id']

                    subreddit = row['subreddit']
                    parent_data = find_parent(parent_id)

                    existing_comment_score = find_existing_score(parent_id)
                    if existing_comment_score:
                        if score > existing_comment_score:
                            if acceptable(body):
                                sql_insert_replace_comment(comment_id, parent_id, parent_data, body, subreddit, created_utc, score)

                    else:
                        if acceptable(body):
                            if parent_data:
                                if score >= 2:
                                    sql_insert_has_parent(comment_id, parent_id, parent_data, body, subreddit, created_utc, score)
                                    paired_rows += 1
                            else:
                                sql_insert_no_parent(comment_id, parent_id, body, subreddit, created_utc, score)
                except Exception as e:
                    print(str(e))

            if row_counter % 100000 == 0:
                print('Total Rows Read: {}, Paired Rows: {}, Time: {}'.format(row_counter, paired_rows, str(datetime.now())))

            #if row_counter > start_row:
            #    if row_counter % cleanup == 0:
            #        print("Cleanin up!")
            #        sql = "DELETE FROM parent_reply WHERE parent IS NULL"
            #        c.execute(sql)
            #        connection.commit()
            #        c.execute("VACUUM")
            #        connection.commit()

和JSON文件(它包含的方式比这更多的评论,但在200.000线不想要粘贴...):

{
    "score_hidden": false,
    "name": "t1_cnas8zv",
    "link_id": "t3_2qyr1a",
    "body": "Most of us have some family members like this. *Most* of my family is like this. ",
    "downs": 0,
    "created_utc": "1420070400",
    "score": 14,
    "author": "YoungModern",
    "distinguished": null,
    "id": "cnas8zv",
    "archived": false,
    "parent_id": "t3_2qyr1a",
    "subreddit": "exmormon",
    "author_flair_css_class": null,
    "author_flair_text": null,
    "gilded": 0,
    "retrieved_on": 1425124282,
    "ups": 14,
    "controversiality": 0,
    "subreddit_id": "t5_2r0gj",
    "edited": false
} {
    "distinguished": null,
    "id": "cnas8zw",
    "archived": false,
    "author": "RedCoatsForever",
    "score": 3,
    "created_utc": "1420070400",
    "downs": 0,
    "body": "But Mill's career was way better. Bentham is like, the Joseph Smith to Mill's Brigham Young.",
    "link_id": "t3_2qv6c6",
    "name": "t1_cnas8zw",
    "score_hidden": false,
    "controversiality": 0,
    "subreddit_id": "t5_2s4gt",
    "edited": false,
    "retrieved_on": 1425124282,
    "ups": 3,
    "author_flair_css_class": "on",
    "gilded": 0,
    "author_flair_text": "Ontario",
    "subreddit": "CanadaPolitics",
    "parent_id": "t1_cnas2b6"
}

编辑:我现在已经试图删除尝试:除了:,但现在我遇到一个新的错误,我不明白,实际上更早遇到:

Traceback (most recent call last):


File "C:\Users\hermans\Desktop\RedditBot\Current_Create_DB.py", line 121, in <module>
    row = json.loads(row)
  File "C:\Program Files (x86)\Python 3.5\lib\json\__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "C:\Program Files (x86)\Python 3.5\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files (x86)\Python 3.5\lib\json\decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
python json python-3.x
1个回答
2
投票

和JSON文件(它包含的方式比这更多的评论,但在200.000线不想要粘贴...):

你已经证明是无效的JSON。剪断了一堆的数据线,我们看到了普遍的问题:

{
    "score_hidden": false,
} {
    "distinguished": null,
}

所述} {是因为你的数据包含多个JSON对象(如JSON标准调用它们)一个接一个 - 而不是让它们嵌套在另一个层内(大概JSON数组,又在该标准的术语)。这应该不是看起来像:

[
    {
        "score_hidden": false,
    }, {
        "distinguished": null,
    }
]

你得到的错误是让您对JSON解析器未能解释无效的JSON(因为它是无效的)细节。通过查看例外回溯 - 当你正确读取的错误信息,这变得清晰。然而,你的代码编写阻止您这样做,由只打印出异常信息,然后继续,好像什么可怕的事情:

try:
    row = json.loads(row)
    # lots more code not relevant to the reported error                    
except Exception as e:
    print(str(e))

不要这样做。您只能使事情变得更难自己。的方式来解决问题,你的代码是在同一时间写更少的代码,并确保它,然后再继续工作。这种异常处理是相反的,并导致对所以这是无关紧要的问题,因为你已经失去了相关指导意见张贴大量的代码:)

如果你已经离开了这个尝试/除块,你的代码将立即救助的第一个错误,但它会告诉你更多的东西有用。这将指向row = json.loads(row)线,并会标注错误的json.decoder.JSONDecodeError,这是一个很大的提示。但更重要的是,不断的东西后运行的代码出现问题,没有一个真正的尝试来解决这个问题(或者至少是正确决定,它可以安全地被忽略),有机会弄乱你的数据进一步。从长远来看,这将导致你的痛苦和苦难,所以这是我试图动摇你改掉这个习惯现在:)

© www.soinside.com 2019 - 2024. All rights reserved.