在for循环中创建嵌套字典而不覆盖python

Question

我正在尝试构建一个 webscrapper，它根据 html 标签获取某些信息并将它们放入字典中。

我有第一个函数，它抓取网站并返回如下字典：

{"Url": "www.test1.de", "Document Title": "test1", "Releaes Date": "January 1, 2020",...}

我的第二个函数获取链接列表作为输入，并且应该使用第一个函数循环遍历这些链接，然后将这些字典附加到一个大字典中。

def create_dict(link_list):
    all_data_dict = {}
    count = 0
    for link in link_list:

        all_data_dict[count] = scrape_doc_info(link,tag_list, selector_dict) # this function returns the dictionnary mentioned above
        print(all_data_dict)
        count +=1
        
    return(all_data_dict)

我希望有以下内容。

all_data_dict = { 0 = {"Url": "www.test1.de", "Document Title": "test1", "Releaes Date": "January 1, 2020",...}, 
1 = {"Url": "www.test2.de", "Document Title": "test2", "Releaes Date": "January 2, 2022",...},..., 20 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}}

但是我的代码总是用最后一个链接的值覆盖键的值。因此，如果我循环 20 个链接，我将始终拥有每个键的最后一个链接的值：

all_data_dict = { 0 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}, 
    1 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...},..., 20 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}}

打印参数的控制台输出如下：

第一循环：

all_data_dict = { 0 = {"Url": "www.test1.de", "Document Title": "test1", "Releaes Date": "January 1, 2020",...}

第二个循环：

all_data_dict = { 0 = {"Url": "www.test2.de", "Document Title": "test2", "Releaes Date": "January 2, 2022",...}, 
1 = {"Url": "www.test2.de", "Document Title": "test2", "Releaes Date": "January 2, 2022",...}}

第20循环：

all_data_dict = { 0 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}, 
    1 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...},..., 20 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2200",...}}

Answer 1

您的

scrape_doc_info

功能一定有问题（不确定可能是什么）

以下代码具有您预期的结果：

dict1 = {"Url": "www.test1.de", "Document Title": "test1", "Releaes Date": "January 1, 2020"}
dict2 = {"Url": "www.test20.de", "Document Title": "test20", "Releaes Date": "January 20, 2020"}
list_of_dicts = [dict1, dict2]

def create_dict(link_list):
    all_data_dict = {}
    count = 0
    for link in link_list:

        all_data_dict[count] = link # this function returns the dictionary mentioned above
        count +=1
        
    return(all_data_dict)
    
my_dict = create_dict(list_of_dicts)
print(my_dict)

控制台输出：

{0: {'Url': 'www.test1.de', 'Document Title': 'test1', 'Releaes Date': 'January 1, 2020'}, 1: {'Url': 'www.test20.de', 'Document Title': 'test20', 'Releaes Date': 'January 20, 2020'}}

在for循环中创建嵌套字典而不覆盖python

问题描述投票：0回答：1

1个回答

最新问题

在for循环中创建嵌套字典而不覆盖python

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1