从csv文件读取url并在csv文件中获取输出的问题

问题描述 投票:0回答:1

以下代码应该从网站https://www.selexion.be/返回少数产品的状态代码和型号。当我将所有URL放入代码中的urls数组中时,它工作正常,但是当我从csv文件中获取url时,出现此错误。

[另外,我想将输出的url,状态代码和型号存储在数组中,并希望在提取所有链接的状态代码和型号时将该数组刷新(.flush()os.fsync())到csv文件。因为我正在终端中获取输出,但是我也希望在csv文件中也输出。

错误:

PS C:\Users\Zandrio> & C:/Users/Zandrio/AppData/Local/Programs/Python/Python38/python.exe "c:/Users/Zandrio/Documents/Advanced Project/Selexion.py"
Traceback (most recent call last):
  File "c:/Users/Zandrio/Documents/Advanced Project/Selexion.py", line 49, in <module>
    asyncio.run(main())
  File "C:\Users\Zandrio\AppData\Local\Programs\Python\Python38\lib\asyncio\runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "C:\Users\Zandrio\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 612, in run_until_complete
    return future.result()
  File "c:/Users/Zandrio/Documents/Advanced Project/Selexion.py", line 41, in main
    await asyncio.gather(*(worker(f'w{index}', url, session)
  File "c:/Users/Zandrio/Documents/Advanced Project/Selexion.py", line 32, in worker
    response = await session.get(url, headers=header)
  File "C:\Users\Zandrio\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client.py", line 380, in _request
    url = URL(str_or_url)
  File "C:\Users\Zandrio\AppData\Local\Programs\Python\Python38\lib\site-packages\yarl\__init__.py", line 149, in __new__
    raise TypeError("Constructor parameter should be str")
TypeError: Constructor parameter should be str

代码:

import asyncio
import csv
import aiohttp
import time
from bs4 import BeautifulSoup

urls = []

try:

 with open('C:\\Users\\Zandrio\\Documents\\Advanced Project\\input_links.csv','r', newline='') as csvIO:
    urls = list(csv.reader(csvIO))

except FileNotFoundError:
    pass


header = {
'Host': 'www.selexion.be',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
'TE': 'Trailers'
}


async def worker(name, url, session):
    response = await session.get(url, headers=header)
    html = await response.read()
    soup = BeautifulSoup(html, features='lxml').select_one('.title-options span:first-of-type').text
    print(f'URL: {url} - {response.status} - {soup}')


async def main():
     async with aiohttp.ClientSession() as session:
            await asyncio.gather(*(worker(f'w{index}', url, session)
                            for index, url in enumerate(urls)))


if __name__ == '__main__':
    start = time.perf_counter()
    asyncio.run(main())
    elapsed = time.perf_counter() - start
    print(f'Executed in {elapsed:0.2f} seconds')
python csv asynchronous beautifulsoup aiohttp
1个回答
0
投票

错误消息说它是类型错误,表示函数返回类型为A的参数,但传递了类型B。

TypeError: Constructor parameter should be str

在线

await asyncio.gather(*(worker(f'w{index}', url, session)

URL的类型是什么?你可以用

找到
type(url)

或通过运行调试器。

当您翻转这两行时会发生什么

 await asyncio.gather(*(worker(f'w{index}', url, session)
                            for index, url in enumerate(urls)))

我不知道url的来源。

© www.soinside.com 2019 - 2024. All rights reserved.