从网站抓取并输出被截断

问题描述 投票:0回答:1

我正在尝试从此 URL 中抓取作者:https://doi.org/10.1155/2021/2122095

它只抓取了 3 个作者,第四个在输出中被截断为椭圆形。

这是代码:

import csv
import requests
from bs4 import BeautifulSoup

# URL
url = 'https://doi.org/10.1155/2021/2122095'

# Send a GET request to the URL
response = requests.get(url)

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

   
authors = soup.find("meta", {"name": "authors"})['content']
    

print(authors) 

这是输出:

毕大田 |孔景源 | ... |杨俊丽

为什么为什么为什么?谢谢!

python web-scraping
1个回答
0
投票

以下是如何从页面获取所有 4 位作者的示例:

import requests
from bs4 import BeautifulSoup

url = "https://doi.org/10.1155/2021/2122095"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

authors = []
for a in soup.select(".articleHeader__authors_author"):
    if a.strong:
        authors.append(a.strong.text)
    else:
        authors.append(a.find_next(string=True))

print(*authors, sep="\n")

打印:

Datian Bi
Jingyuan Kong
Xue Zhang
Junli Yang
© www.soinside.com 2019 - 2024. All rights reserved.