python在抓取时仅请求返回空集

问题描述 投票:-4回答:1

这是我第一次尝试编程。我正在尝试通过使用bs4,硒等进行报废来删除一些单词...我使用的网站是“ http://oulim.kr

如何在框架集内刮东西?

这是我尝试过的

import urllib
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://oulim.kr/'

driver = webdriver.Chrome('./driver/chromedriver')
driver.get(url)

html = driver.page_source
soup = BeautifulSoup(html)

a = soup.select("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a)

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('http://oulim.kr')
r.html.find('.tbody')
python html web-scraping frame
1个回答
0
投票

Selenium将帧视为单独的页面(因为它必须单独加载),并且不在帧中搜索。并且page_source不会从帧返回HTML

您必须找到<frame>并切换到正确的框架switch_to.frame(..)才能使用。

frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])

import urllib
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://oulim.kr/'

driver = webdriver.Chrome('./driver/chromedriver')
driver.get(url)

# --- switch frame ---

frames = driver.find_elements_by_tag_name('frame')
driver.switch_to.frame(frames[0])

# --- CSS without BeautifulSoup ---

a = driver.find_element_by_css_selector("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a.text)

# --- CSS with BeautifulSoup ---

html = driver.page_source
soup = BeautifulSoup(html)

a = soup.select("#divAlba > table:nth-child(3) > tbody > tr:nth-child(2) > td:nth-child(5) > a > font > b")
print(a[0].text)
© www.soinside.com 2019 - 2024. All rights reserved.