在编写脚本以自动浏览 Meta Ads Library 时,我在以下 while 循环中遇到错误,该循环负责实际遍历页面(逐个元素)。这个想法是从预定义关键字列表(例如
Learn More
,Shop Now
等)中的按钮元素中提取URL。
我遇到的第一个问题是,在浏览广告时,主 if 语句的
starting_element
对于没有按钮角色的任何元素都不应该为真。除了它随机绕过这个规则以获得一个简单的span element with basic text
,这是我怀疑它进入兔子洞的地方,最终试图从中提取一个URL(我遇到了这个问题)。
相关元素的 HTML:
<div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;" xpath="1">Learn More</div>
我在点击该元素时收到的错误消息,对其进行处理,转到实际按钮,然后在中断之前转到以下元素:
Message: invalid selector: The result of the xpath expression ".." is: [object HTMLDocument]. It should be an element.
# actual tabbing process, with a starting point and the next element being reassigned to the initial, to tab to
while(True):
starting_element.send_keys(Keys.TAB)
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH, "//body/div/div/div[@role='main']/div/div/div/div/div/div/div[4]/div[1]/div[1]/div[1]/div[1]/div[1]/div[2]/div[1]")))
starting_element = browser.switch_to.active_element
# check for a set of keywords when a CTA button is targeted, if matched then extract URL from source
if starting_element.get_attribute('role') == 'button':
button_text = starting_element.text
if button_text in meta_cta_buttons:
parent_element = starting_element.find_element(By.XPATH, "..")
while (True):
if parent_element.tag_name != 'a':
# moves up element ancestry chain
parent_element = parent_element.find_element(By.XPATH, "..")
else:
cta_url = parent_element.get_attribute('href')
# store links in a set
unique_store_urls.add(cta_url)
break
else:
continue
else:
continue
try:
# to look for the loading page data as part of infinite scroll
spinner_element = WebDriverWait(browser, 5).until(EC.visibility_of_element_located((By.XPATH, "//span[@role='progressbar']//*[name()='svg']")))
end_of_page_element = browser.find_element(By.XPATH, "//a[contains(text(),'Ad Library API')]")
if spinner_element:
print("Spinner exists")
time.sleep(3)
# if its at the footer, it means no data was loaded in time
elif starting_element == end_of_page_element:
break
except TimeoutException:
print("Spinner doesn't exist")
continue
到目前为止我尝试的是将循环中的主要 if 语句更改为使用 get_attribute 而不是 aria_role,但它没有做任何事情。