如何在Python Web Scraper中高效实现多线程？

Question

Stack Overflow 社区您好，我目前正在开发一个涉及使用 Python 和 BeautifulSoup 进行网页抓取的项目。我现在拥有的代码适用于较小的网站，但它不适用于包含数千个页面的较大网站，导致处理时间较长。这是我当前使用的单线程代码的简化版本：

import requests 
def scrape_website(url): response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # ... (scraping code here) ... urls = ['http://example1.com', 'http://example2.com', 'http://example3.com'] for url in urls: scrape_website(url) I attempted to run the scraper sequentially on multiple URLs using a for loop, hoping for a quick process, but it".

Answer 1

为了加快该过程，您可以利用多线程或多处理同时抓取多个 URL。

import requests
from bs4 import BeautifulSoup
import threading

def scrape_website(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    # ... (scraping code here) ...

def scrape_multiple_websites(urls):
    threads = []
    for url in urls:
        thread = threading.Thread(target=scrape_website, args=(url,))
        threads.append(thread)
        thread.start()

    # Wait for all threads to finish
    for thread in threads:
        thread.join()

if __name__ == "__main__":
    urls = ['http://example1.com', 'http://example2.com', 'http://example3.com']
    scrape_multiple_websites(urls)

如何在Python Web Scraper中高效实现多线程？

问题描述投票：0回答：1

1个回答

最新问题

如何在Python Web Scraper中高效实现多线程？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1