查找路径中包含单词“/complete/”且 url 不在任何 HTML 代码中的网站的 url 数量

问题描述投票：0回答：0

我的目标是跟踪每天创建的每个路径中带有“完整”一词的唯一网址。例如，

https://example.com/complete/yyrh38/

。限制是这些 url 不存在于网站上的任何 html 代码中，并且全天不时动态生成。我无法访问服务器日志，也无法访问网站的服务器。

我的一个想法是设置代理服务器并每隔 x 秒发送一次请求，但我不确定这是否可行。请参阅下面的示例请求。

# Randomly select user agent
user_agent = random.choice(useragent())

# Set headers
headers = {'User-Agent': user_agent}
    
# Send request to website through proxy server
response = requests.get(input_url, headers=headers, proxies=proxies)

new_urls = []
for url in response.text.split('\n'):
    if input_url in url and '/complete/' in url:
       new_urls.append(url)

python

python-requests

proxy

web-crawler

查找路径中包含单词“/complete/”且 url 不在任何 HTML 代码中的网站的 url 数量

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0