我正在做一些网络抓取,实际上我的代码有问题。
我想做的就是:
我认为发生的事情是这样的:
我知道它不起作用,因为当程序单击“发送”时,网站说它没有识别来自 hCaptcha 或文本输入的 cookie。当我在不使用硒的情况下运行相同的例程时(当我完全手动执行时),它可以正常工作。我已经尝试过更换浏览器了。
我该怎么办?请在我的代码中演示如下:
import time
import pickle
from selenium.webdriver.common.by import By
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-notifications")
chrome_options.add_argument("--disable-infobars")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://solucoes.receita.fazenda.gov.br/Servicos/cnpjreva/cnpjreva_solicitacao.asp")
# Waiting for the client to manually fill captcha
time.sleep(45)
cnpj_code = "33.224.254/0001-42"
# Then, it fills the text input
campo_texto = driver.find_element(By.XPATH, "//html//body//div[1]//div[1]//div//div//div//form//div[1]//div[1]//div//div//input")
campo_texto.send_keys(cnpj_code)
# Then, it clicks the button to GO
enter_button = driver.find_element(By.XPATH, "//html//body//div[1]//div[1]//div//div//div//form//div[3]//div//button[1]")
enter_button.click()
time.sleep(10) #should actually show the results but just stays in the same page and the website says there is a error with cookies
driver.quit()
尝试更换浏览器。 我希望它显示的结果与我手动运行网站时相同(不使用硒)。 这导致了 cookie 错误。
您可以使用 unDetected-chromedriver 解决问题,您可以轻松安装它:
pip install undetected-chromedriver
以下是如何使用干净且更好的代码版本:
# Import necessary libraries
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import selenium.webdriver.support.expected_conditions as EC
# Initialize an undetected Chrome driver instance
driver = uc.Chrome()
# Create a WebDriverWait object with a timeout of 30 seconds
wait = WebDriverWait(driver, 30)
driver.get("https://solucoes.receita.fazenda.gov.br/Servicos/cnpjreva/cnpjreva_solicitacao.asp")
# Wait for the CAPTCHA iframe to appear and switch to it
frame = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'iframe[title="Widget contendo caixa de seleção para desafio de segurança hCaptcha"]')))
driver.switch_to.frame(frame)
# Wait for the CAPTCHA box to be checked (indicating manual completion by the user)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[aria-checked="true"]')))
# Switch back to the default content (out of the CAPTCHA iframe)
driver.switch_to.default_content()
# Define the CNPJ code to be entered
cnpj_code = "33.224.254/0001-42"
# Locate the CNPJ input element, enter the CNPJ code, and proceed
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "input#cnpj"))).send_keys(cnpj_code)
# click on the button 'Consultar'
driver.find_element(By.CSS_SELECTOR, "button.btn.btn-primary").click()
# Wait for the results page to load
wait.until(EC.presence_of_element_located((By.ID, 'principal')))