Chrome 驱动程序 Selenioum 无法从 Kayak 航班搜索中按类名称获取 div 标签

问题描述 投票:0回答:1

我想使用类名来识别有效的搜索结果列表,然后迭代到废品价格。但是,代码仍然无法识别该类。我知道它使用了javascript,但我认为selenium可以在渲染后识别标签。我哪一部分错了?欣赏

import time
import subprocess
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

from bs4 import BeautifulSoup
import pandas as pd
#from playsound import playsound
import datetime
import threading
service = Service(executable_path='xxx')


option = webdriver.ChromeOptions()
option.add_argument("--headless=new")
option.add_argument('--ignore-certificate-errors')
option.add_argument("--no-sandbox")
option.add_argument('disable-notifications')
driver = webdriver.Chrome(service=service,options=option)

def search(dep,arr,date):
    print(f'''Input: Date:{date},Departure: {dep} - Arrival: {arr}''')
    temp_url = 'https://www.kayak.com/flights/'

    base_url = temp_url + dep+'-'+arr+'/'+str(date)+'?sort=price_a&fs=stops=0'

    
    df_record = pd.DataFrame(columns=['deptime','arrtime','dep','arr',
                                     'airline' ,'flightNum','price','ling'])  
    print("before webdriver.ChromeOptions()")
   
    my_url = base_url
    driver.get(my_url)
    print(my_url)
    time.sleep(3) # set the time to wait till web fully loaded
    
    # wait for the close button to be visible and click it
    try:
        close_button = driver.find_element(By.XPATH, '//*[@class="nrc6"]')
        close_button.click()
    except:
        print("close is not found.")

    
    elem = driver.find_element("xpath","//*")
    
    source_code = elem.get_attribute("outerHTML")
    #print(source_code)
    bs = BeautifulSoup(source_code, 'html.parser')
    #print(bs)
    #expand 
    drawing_url = bs.find_all('button', class_='nrc6')
    print(len(drawing_url)) # this shouldn't be zero
    if len(drawing_url)==0: return
    else: print(base_url)
python html selenium-webdriver web-scraping tags
1个回答
0
投票

我不确定我是否能够理解您的担忧。 但基本上,您想从网站上获取价格列表。 根据检查,价格使用相同的类别,即“f8F1-price-text”。

    driver.get("https://www.kayak.com/flights/SFO-TYO/2024-03-21/2024-03-28?sort=bestflight_a");
    
    Thread.sleep(5000);
    By tempElement = By.xpath("//div[@class='nrc6']");
    List <WebElement> elmTicketDetails = driver.findElements(tempElement);
    System.out.println("====================================================================================================================");
    for (int cnt = 1; cnt <= elmTicketDetails.size(); cnt++) {
        By byFromDetail = By.xpath("//div[@class='nrc6'][" + cnt + "]//li[@class='hJSA-item'][1]");
        By byToDetail = By.xpath("//div[@class='nrc6'][" + cnt + "]//li[@class='hJSA-item'][2]");
        By byPrice = By.xpath("//div[@class='nrc6'][" + cnt + "]//div[@class='f8F1-price-text']");
        WebElement elmFromDetail = driver.findElement(byFromDetail);
        WebElement elmToDetail = driver.findElement(byToDetail);
        WebElement elmPrice = driver.findElement(byPrice);
        System.out.println("Details for Ticket # " + cnt);
        System.out.println("Flight From: " + elmFromDetail.getText());
        System.out.println("Flight To: " + elmToDetail.getText());
        System.out.println("Price: " + elmPrice.getText());
        System.out.println("====================================================================================================================");
    }

回复是:

====================================================================================================================
Details for Ticket # 1
Flight From: 12:20 pm – 7:20 pm
+2
EVA Air
1 stop
TPE
39h 00m
SFO
-
NRT
Flight To: 1:00 pm – 6:40 am
+1
EVA Air
1 stop
TPE
33h 40m
NRT
-
SFO
Price: $1,207
====================================================================================================================
Details for Ticket # 2
Flight From: 11:50 am – 3:10 pm
...

这是用Java写的,但是逻辑应该是一样的。

更新:更新了示例的代码,我将其作为一个整体打印出来,但是您可以声明多个定位器来指向具体细节。

例如 如果您只想获得时间:

© www.soinside.com 2019 - 2024. All rights reserved.