加快硒中的包裹速度

问题描述 投票:1回答:1

我编写了一个小应用程序,该应用程序从AWS站点获取预留实例的价格,然后打印该实例的名称及其价格(我只希望有3年的可转换期限)。应用程序有效。但是,它的工作速度非常慢,可能是因为列表allElements包含1925个元素,后来我遍历了所有元素。我想像代码中那样过滤数据(让我们仅使用名称以c5开头的Linux实例)。如何更快地做到呢?有没有机会加快过滤速度,并且不要将所有内容从此站点放到allElements列表中?预先感谢您的帮助!

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import time

caps = DesiredCapabilities().FIREFOX
#caps["pageLoadStrategy"] = "normal"  #  complete
#caps["pageLoadStrategy"] = "eager"  #  interactive
caps["pageLoadStrategy"] = "none"
browser = webdriver.Firefox(desired_capabilities=caps)
browser.get('https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/')
delay=3

time.sleep(10)

#browser.find_element_by_link_text('Windows').click()

try:
    myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'aws-plc-content')))
    print ("Page is ready!")
except TimeoutException:
    print ("Loading took too much time!")

time.sleep(2)

allElements=browser.find_elements_by_class_name("aws-pricing-table-wrapper")


for el in allElements:
    lista=el.text.split("\n")
    indeks=lista.index("CONVERTIBLE 3-YEAR TERM")
    prices=lista[indeks+2]
    if lista[0].startswith('c5'):
        print(lista[0])
        print(prices.split()[4])
python selenium web-scraping
1个回答
1
投票

我尝试过使用Chrome驱动程序,希望我们在FF上也能获得相同的结果。

您需要做几件事才能达到目的。

  1. 无限循环并首先滚动页面
  2. 使用WebDriverWait查找所有元素并追加到列表中,同时检查列表中是否存在重复项
  3. 一旦到达底部,它将从循环中中断。
  4. 使用以下XPATH可以得到您想要的输出。
  5. 如果使用element.get_attribute("textContent"),请使用element.text获取值,您可能会以一些空白字符串结尾。

尝试以下代码。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import time


browser = webdriver.Chrome()
browser.get('https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/')
delay=10

try:
    myElem = WebDriverWait(browser, delay).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.aws-plc-content')))
    print ("Page is ready!")
except TimeoutException:
    print ("Loading took too much time!")

last_height = browser.execute_script("return document.body.scrollHeight")
items=[]
while True:
    browser.find_element_by_tag_name('body').send_keys(Keys.END)
    time.sleep(1)

    allElements = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located(
        (By.XPATH, "//div[@class='aws-pricing-table-wrapper']/h2[starts-with(text(),'c5.')]")))
    print(len(allElements))
    for el in allElements:

        if el.text in items:
            continue

        items.append(el.get_attribute("textContent").strip())
        items.append(el.find_element_by_xpath("./following-sibling::table[4]//tr//th[contains(.,'Convertible 3-Year Term')]/following::tbody[1]//tr[1]//td[4]").get_attribute("textContent").strip())

    new_height = browser.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

#Print all items and their price.
print(items)
#Get the length of the list #it should be 244X2
print(len(items))

控制台输出:

Page is ready!
9
88
['c5.large', '$0.041', 'c5.xlarge', '$0.081', 'c5.2xlarge', '$0.162', 'c5.4xlarge', '$0.324', 'c5.9xlarge', '$0.729', 'c5.12xlarge', '$0.985', 'c5.18xlarge', '$1.459', 'c5.24xlarge', '$1.970', 'c5.metal', '$1.970', 'c5.large', '$0.101', 'c5.xlarge', '$0.141', 'c5.2xlarge', '$0.292', 'c5.4xlarge', '$0.454', 'c5.9xlarge', '$0.859', 'c5.12xlarge', '$1.115', 'c5.18xlarge', '$1.589', 'c5.24xlarge', '$2.100', 'c5.metal', '$2.100', 'c5.large', '$0.074', 'c5.xlarge', '$0.114', 'c5.2xlarge', '$0.195', 'c5.4xlarge', '$0.357', 'c5.9xlarge', '$0.762', 'c5.12xlarge', '$1.018', 'c5.18xlarge', '$1.492', 'c5.24xlarge', '$2.003', 'c5.metal', '$2.003', 'c5.large', '$0.133', 'c5.xlarge', '$0.265', 'c5.2xlarge', '$0.530', 'c5.4xlarge', '$1.060', 'c5.9xlarge', '$2.385', 'c5.12xlarge', '$3.193', 'c5.18xlarge', '$4.771', 'c5.24xlarge', '$6.386', 'c5.metal', '$6.386', 'c5.large', '$0.613', 'c5.xlarge', '$0.745', 'c5.2xlarge', '$1.490', 'c5.4xlarge', '$2.980', 'c5.9xlarge', '$6.705', 'c5.12xlarge', '$8.953', 'c5.18xlarge', '$13.411', 'c5.24xlarge', '$17.906', 'c5.metal', '$17.906', 'c5.large', '$0.200', 'c5.xlarge', '$0.333', 'c5.2xlarge', '$0.665', 'c5.4xlarge', '$1.331', 'c5.9xlarge', '$2.994', 'c5.12xlarge', '$4.004', 'c5.18xlarge', '$5.988', 'c5.24xlarge', '$8.008', 'c5.metal', '$8.008', 'c5.xlarge', '$1.765', 'c5.2xlarge', '$3.530', 'c5.4xlarge', '$7.060', 'c5.9xlarge', '$15.885', 'c5.12xlarge', '$21.193', 'c5.18xlarge', '$31.771', 'c5.24xlarge', '$42.386', 'c5.metal', '$42.386', 'c5.large', '$0.521', 'c5.xlarge', '$0.561', 'c5.2xlarge', '$1.122', 'c5.4xlarge', '$2.244', 'c5.9xlarge', '$5.049', 'c5.12xlarge', '$6.745', 'c5.18xlarge', '$10.099', 'c5.24xlarge', '$13.490', 'c5.metal', '$13.490', 'c5.large', '$0.108', 'c5.xlarge', '$0.149', 'c5.2xlarge', '$0.297', 'c5.4xlarge', '$0.595', 'c5.9xlarge', '$1.338', 'c5.12xlarge', '$1.796', 'c5.18xlarge', '$2.676', 'c5.24xlarge', '$3.592', 'c5.metal', '$3.592', 'c5.xlarge', '$1.581', 'c5.2xlarge', '$3.162', 'c5.4xlarge', '$6.324', 'c5.9xlarge', '$14.229', 'c5.12xlarge', '$18.985', 'c5.18xlarge', '$28.459', 'c5.24xlarge', '$37.970', 'c5.metal', '$37.970']
176
© www.soinside.com 2019 - 2024. All rights reserved.