我正在尝试从此 URL 中抓取作者:https://doi.org/10.1155/2021/2122095
它只抓取了 3 个作者,第四个在输出中被截断为椭圆形。
这是代码:
import csv
import requests
from bs4 import BeautifulSoup
# URL
url = 'https://doi.org/10.1155/2021/2122095'
# Send a GET request to the URL
response = requests.get(url)
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
authors = soup.find("meta", {"name": "authors"})['content']
print(authors)
这是输出:
毕大田 |孔景源 | ... |杨俊丽
为什么为什么为什么?谢谢!
以下是如何从页面获取所有 4 位作者的示例:
import requests
from bs4 import BeautifulSoup
url = "https://doi.org/10.1155/2021/2122095"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
authors = []
for a in soup.select(".articleHeader__authors_author"):
if a.strong:
authors.append(a.strong.text)
else:
authors.append(a.find_next(string=True))
print(*authors, sep="\n")
打印:
Datian Bi
Jingyuan Kong
Xue Zhang
Junli Yang