刮取动态网站是否有特定方法？

问题描述投票：0回答：1

我正在尝试抓取the following webpage。

搜索框（显示输入安全名称/代码/ ID）是我遇到的困难。我无法使用xpath进行抓取，我正在使用机械化库进行浏览器模拟，但是它似乎不起作用。

我遇到了这个问题Excel VBA Scrape Web Page，与我的问题非常相似，但是我不知道如何使用Python来实现它。

我尝试的代码：

import mechanize
from bs4 import BeautifulSoup
import requests
url = 'https://www.bseindia.com/corporates/corporate_act.aspx'
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
br = mechanize.Browser()
br.open(url)
br.select_form(nr=0)
br['ContentPlaceHolder1_SmartSearch_smartSearch'] = input("enter the company name")
response = br.submit()

P.S .：我是网络爬虫的新手，我想知道大学项目的解决方案，因此非常感谢您的帮助。

python web-scraping beautifulsoup scrapy mechanize

1个回答

0
投票

我建议您使用scrapy框架。我认为它是更先进的工具之一，包括用于模拟与网站互动的工具。

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.