Python Multiprocessing Manager-列表名称错误?

问题描述 投票:0回答:1

我正在尝试使用共享列表,该列表将更新Selenium的抓取信息,以便以后可以导出此信息或以我选择的方式使用它。由于某种原因,它给了我这个错误:NameError:名称“ scrapedinfo”未定义...

这对我来说真的很奇怪,因为我将列表声明为Global,然后我使用multiprocessing.Manager()创建了列表。我已经多次检查代码,这不是区分大小写的错误。我还尝试通过函数将列表作为变量传递,但这会引起其他问题,并且无法正常工作。任何帮助是极大的赞赏!

from selenium import webdriver
from multiprocessing import Pool

def browser():  
    driver = webdriver.Chrome()
    return driver

def test_func(link):
    driver = browser()
    driver.get(link)

def scrape_stuff(driver):

    #Scrape things
    scrapedinfo.append(#Scraped Stuff)

def multip():
    manager = Manager()

    #Declare list here

    global scrapedinfo
    scrapedinfo = manager.list()

    links = ["https://stackoverflow.com/", "https://signup.microsoft.com/", "www.example.com"]
    chunks = [links[i::3] for i in range(3)]
    pool = Pool(processes=3)
    pool.map(test_func, chunks)
    print(scrapedinfo)

multip()
python list python-multiprocessing
1个回答
0
投票

在Windows中,多处理会执行一个新的python进程,然后尝试为该子进程的父级状态腌制/解开腌制。不包括未在map调用中传递的全局变量。未在子级中创建scrapedinfo,并且出现错误。

一种解决方案是在地图调用中传递scrapedinfo。整理一个简单的例子,

from multiprocessing import Pool, Manager

def test_func(param):
    scrapedinfo, link = param
    scrapedinfo.append("i scraped stuff from " + str(link))

def multip():
    manager = Manager()

    global scrapedinfo
    scrapedinfo = manager.list()

    links = ["https://stackoverflow.com/", "https://signup.microsoft.com/", "www.example.com"]
    chunks = [links[i::3] for i in range(3)]
    pool = Pool(processes=3)
    pool.map(test_func, list((scrapedinfo, chunk) for chunk in chunks))
    print(scrapedinfo)

if __name__=="__main__":
    multip()

但是您在Manager上要做的工作比您需要做的更多。 map将工作程序的返回值传递回父进程(并处理分块)。所以你可以做:

from multiprocessing import Pool, Manager

def test_func(link):
    return "i scraped stuff from " + link

def multip():
    links = ["https://stackoverflow.com/", "https://signup.microsoft.com/", "www.example.com"]
    pool = Pool(processes=3)
    scrapedinfo = pool.map(test_func, links)
    print(scrapedinfo)

if __name__=="__main__":
    multip()

并且避免对笨拙的列表代理进行额外的处理。

© www.soinside.com 2019 - 2024. All rights reserved.