在python中的web抓取期间json错误转储

Question

我正在尝试从数字公共网站下载缩略图，以便进行imageJ可视化。一切都打印到JSON转储文件。我有一个由我的朋友写的代码来下载图像，但我需要在继续之前有一个URL的json文件。最后，它给出了“Tag类型的对象不是JSON可序列化”的错误。

对不起空间，我是新的堆栈溢出，当我从Sublime复制和过去时，它搞砸了。

from bs4 import BeautifulSoup
import requests
import re
import json

all_my_data = []

url = "https://www.digitalcommonwealth.org/search?f%5Bcollection_name_ssim%5D%5B%5D=Produce+Crate+Labels&f%5Binstitution_name_ssim%5D%5B%5D=Boston+Public+Library&per_page=50"
results_page = requests.get(url)
page_html = results_page.text
soup = BeautifulSoup(page_html, "html.parser")

all_labels = soup.find_all("div", attrs = {'class': 'document'})

for items in all_labels:
    my_data = {
    "caption": None,
        "url": None,
    "image url": None,
    }
    item_link = items.find('a') 
abs_url = "https://www.digitalcommonwealth.org/search?f%5Bcollection_name_ssim%5D%5B%5D=Produce+Crate+Labels&f%5Binstitution_name_ssim%5D%5B%5D=Boston+Public+Library&per_page=50" + item_link["href"]
my_data["url"] = abs_url

#print(abs_url)

item_request = requests.get(abs_url)
    item_html = item_request.text
item_soup = BeautifulSoup(item_html, "html.parser")

all_field_divs = item_soup.find_all("div", attrs={'class': 'caption'})

for field in all_field_divs:
    caption = field.find("a")
    cpation = caption.text
    my_data["caption"] = caption
    #print(caption)

all_photo_urls = item_soup.find_all("div", attrs={'class': 'thumbnail'})

for photo_url in all_photo_urls:
    photo = photo_url.find('img')
    photo_abs_url = "https://www.digitalcommonwealth.org/search?f%5Bcollection_name_ssim%5D%5B%5D=Produce+Crate+Labels&f%5Binstitution_name_ssim%5D%5B%5D=Boston+Public+Library&per_page=50" + photo['src']
    my_data['image url'] = photo_abs_url
    #print(photo_abs_url)

all_my_data.append(my_data)

#print(all_my_data)


with open('fruit_crate_labels.json', 'w') as file_object:
    json.dump(all_my_data, file_object, indent=2)
    print('Your file is now ready')

它打印这个：

回溯（最近一次调用最后一次）：文件“dh.py”，第54行，在json.dump（all_my_data，file_object，indent = 2）文件“/Library/Frameworks/Python.framework/Versions/3.7/lib/python3。 7 / json / init.py“，第179行，在for iterable中的块中转储：文件”/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py“，第429行， _iterencode中的_iterencode_list（o，_ current_indent_level）文件“/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py”，第325行，_iterencode_list从块文件“/库中产生” /Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py“，第405行，_iterencode_dict从块文件中获得”/Library/Frameworks/Python.framework/Versions/3.7/lib/python3 .7 / json / encoder.py“，第438行，_iterencode o = _default（o）文件”/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py“，行179，默认情况下引发TypeError（f'Object类型为{o.class.name}'TypeError：Tag类型的对象不是JSO N可序列化

谢谢您的帮助！

Answer 1

第35行的以下代码：

cpation = caption.text

应该：

caption = caption.text

然后您的代码似乎按预期工作。

在python中的web抓取期间json错误转储

问题描述投票：1回答：1

1个回答

最新问题

在python中的web抓取期间json错误转储

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1