我正在尝试废弃一个网站,我可以获取产品详细信息,但在尝试获取所有类别和子类别链接以访问所有页面时出现错误。错误是说链接是字符串,但是当我手动打开网络上的链接时,我可以访问该网站。我添加了下面的错误
import requests
from tqdm import tqdm
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import *
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import json
import pandas as pd
from unidecode import unidecode
from webdriver_manager.chrome import ChromeDriverManager
browser = webdriver.Chrome(ChromeDriverManager().install())
URL = 'https://www.tapwarehouse.com/'
def get_category_links(URL):
category_links = []
browser.get(URL)
html = browser.page_source
soup = BeautifulSoup(html,'html5lib')
cat=soup.find_all("ul",{"class":"c-nav__list"})[0].find_all('a')
for i in cat:
try:
link=i["href"]
if link=='javascript:void(0)':
pass
else:
category_links.append("https://www.tapwarehouse.com"+i["href"])
except:
pass
return category_links
def get_sub_category_links(URL):
sub_category_links=[]
browser.get(URL)
html = browser.page_source
soup = BeautifulSoup(html,'html5lib')
for link in soup.find_all('a', {'class': "m-categories__menu__link"}):
sub_category_links.append("https://www.tapwarehouse.com/"+link["href"])
return sub_category_links
response = []
sublist=[]
urllist=[]
for cat_link in get_category_links(URL = URL):
for subcat_obj in get_sub_category_links(URL = cat_link):
try:
get_sub_category_links = subcat_obj
print(f'sub category is {get_sub_category_links}')
sublist.append(get_sub_category_links)
sublist = list(set(sublist))
except:
pass
TypeError Traceback (most recent call last)
<ipython-input-22-8873cbd974a9> in <module>
4 for cat_link in get_category_links(URL = URL):
----> 5 for subcat_obj in get_sub_category_links(URL = cat_link):
TypeError: 'str' object is not callable
您已定义变量
get_sub_category_links
两次:一次作为函数,一次作为变量(在 try/ except 中):get_sub_category_links = subcat_obj
您应该使用不同的名称来定义循环中的变量,也许您可以将其重命名为
sub_category_link
。因此将“try/ except”中的代码替换为:
sub_category_link = subcat_obj
print(f'sub category is {sub_category_link}')
sublist.append(sub_category_link)
sublist = list(set(sublist))