在for循环中创建嵌套字典而不覆盖python

问题描述 投票:0回答:1

我正在尝试构建一个 webscrapper,它根据 html 标签获取某些信息并将它们放入字典中。

我有第一个函数,它抓取网站并返回如下字典:

{"Url": "www.test1.de", "Document Title": "test1", "Releaes Date": "January 1, 2020",...}

我的第二个函数获取链接列表作为输入,并且应该使用第一个函数循环遍历这些链接,然后将这些字典附加到一个大字典中。

def create_dict(link_list):
    all_data_dict = {}
    count = 0
    for link in link_list:

        all_data_dict[count] = scrape_doc_info(link,tag_list, selector_dict) # this function returns the dictionnary mentioned above
        print(all_data_dict)
        count +=1
        
    return(all_data_dict)

我希望有以下内容。

all_data_dict = { 0 = {"Url": "www.test1.de", "Document Title": "test1", "Releaes Date": "January 1, 2020",...}, 
1 = {"Url": "www.test2.de", "Document Title": "test2", "Releaes Date": "January 2, 2022",...},..., 20 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}}

但是我的代码总是用最后一个链接的值覆盖键的值。因此,如果我循环 20 个链接,我将始终拥有每个键的最后一个链接的值:

all_data_dict = { 0 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}, 
    1 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...},..., 20 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}}

打印参数的控制台输出如下:

第一循环:

all_data_dict = { 0 = {"Url": "www.test1.de", "Document Title": "test1", "Releaes Date": "January 1, 2020",...}

第二个循环:

all_data_dict = { 0 = {"Url": "www.test2.de", "Document Title": "test2", "Releaes Date": "January 2, 2022",...}, 
1 = {"Url": "www.test2.de", "Document Title": "test2", "Releaes Date": "January 2, 2022",...}}

第20循环:

all_data_dict = { 0 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}, 
    1 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...},..., 20 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}}
python loops dictionary nested overwrite
1个回答
0
投票

您的

scrape_doc_info
功能一定有问题(不确定可能是什么)

以下代码具有您预期的结果:

dict1 = {"Url": "www.test1.de", "Document Title": "test1", "Releaes Date": "January 1, 2020"}
dict2 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2020"}
list_of_dicts = [dict1, dict2]

def create_dict(link_list):
    all_data_dict = {}
    count = 0
    for link in link_list:

        all_data_dict[count] = link # this function returns the dictionary mentioned above
        count +=1
        
    return(all_data_dict)
    
my_dict = create_dict(list_of_dicts)
print(my_dict)

控制台输出:

{0: {'Url': 'www.test1.de', 'Document Title': 'test1', 'Releaes Date': 'January 1, 2020'}, 1: {'Url': 'www.test20.de', 'Document Title': 'test20', 'Releaes Date': 'January 20, 2020'}}
© www.soinside.com 2019 - 2024. All rights reserved.