Python：BeautifulSoup从div类中提取所有标题文本

Question

import requests
from bs4 import BeautifulSoup

res = requests.get('http://aicd.companydirectors.com.au/events/events-calendar')
soup = BeautifulSoup(res.text,"lxml")


event_containers = soup.find_all('div', class_ = "col-xs-12 col-sm-6 col-md-8")

first_event = event_containers[0]  
print(first_event.h3.text)

通过使用此代码，我能够提取事件名称，我正在尝试循环并提取所有事件名称和日期？而且我正在尝试提取点击readmore链接后可见的位置信息

Answer 1

event_containers是一个bs4.element.ResultSet对象，它基本上是Tag对象的列表。只需循环遍历event_containers中的标签并选择h3作为标题，选择div.date作为日期，选择a作为URL，例如：

for tag in event_containers:
    print(tag.h3.text)
    print(tag.select_one('div.date').text)
    print(tag.a['href'])

现在，对于位置信息，您必须访问每个URL并收集div.date中的文本。完整代码：

import requests
from bs4 import BeautifulSoup

res = requests.get('http://aicd.companydirectors.com.au/events/events-calendar')
soup = BeautifulSoup(res.text,"lxml")
event_containers = soup.find_all('div', class_ = "col-xs-12 col-sm-6 col-md-8")
base_url = 'http://aicd.companydirectors.com.au'

for tag in event_containers:
    link = base_url + tag.a['href']
    soup = BeautifulSoup(requests.get(link).text,"lxml")
    location = ', '.join(list(soup.select_one('div.event-add').stripped_strings)[1:-1])
    print('Title:', tag.h3.text)
    print('Date:', tag.select_one('div.date').text)
    print('Link:', link)
    print('Location:', location)

Answer 2

试试这个以获取您之后的所有活动和日期：

import requests
from bs4 import BeautifulSoup

res = requests.get('http://aicd.companydirectors.com.au/events/events-calendar')
soup = BeautifulSoup(res.text,"lxml")
for item in soup.find_all(class_='lead'):
    date = item.find_previous_sibling().text.split(" |")[0]
    print(item.text,date)

Python：BeautifulSoup从div类中提取所有标题文本

问题描述投票：1回答：2

2个回答

最新问题

Python：BeautifulSoup从div类中提取所有标题文本

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2