我试图遍历目录中的所有html文件,但收到此错误:
NotImplementedError: Non-relative patterns are unsupported
我正在使用的代码是:
from bs4 import BeautifulSoup
import argparse
from pathlib import Path
parser = argparse.ArgumentParser(description = ("Script to scrape data from antismash html output"))
parser.add_argument("-p", "--path", help = "give path/to/directory containing antismash outputs", required = True)
args = parser.parse_args()
for file in Path(args.path).glob("/*.html"):
def scraper(filename):
soup = BeautifulSoup(open(filename), 'html.parser')
soup.findAll('a') > os.path.basename(filename).txt
我以前使用过相同的方法,但未出错,所以我不确定发生了什么。
与/
一起使用以获得正确代码时,您无需在glob
调用中使用PathLib
:
for file in Path(args.path).glob("*.html"):
def scraper(filename):
soup = BeautifulSoup(open(filename), 'html.parser')
soup.findAll('a') > os.path.basename(filename).txt