如何使用BeautifulSoup抓取脚本数据?

问题描述 投票:0回答:1

我正在尝试从脚本中抓取数据。首先我使用 soup.find_all 然后使用 js2py 进行转换,最后打印所需的数据。但没有成功。我想知道如何收集soldNum信息?

这是我的代码:

from bs4 import BeautifulSoup
import requests
import js2py

html_text = requests.get('https://www.daraz.com.bd/womens-shalwar-kameez/?spm=a2a0e.home.cate_1_1.2.735212f7tVtHu9&price=600-&from=filter').text
soup = BeautifulSoup(html_text, 'lxml')
# Find the div with the specific attribute
script_content = soup.find_all('script')[3]
for script in script_content:
    f = js2py.eval_js(script)
    print(f)
    soldNum = f['soldNum']
    print(soldNum)

javascript python beautifulsoup
1个回答
0
投票

我们可以使用正则表达式(re 模块)来搜索脚本内容中的 sellNum 数据

from bs4 import BeautifulSoup
import requests
import re
import json

# Fetch the page content
url = 'https://www.daraz.com.bd/womens-shalwar-kameez/?spm=a2a0e.home.cate_1_1.2.735212f7tVtHu9&price=600-&from=filter'
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'lxml')

# Use regular expressions to find patterns that might hold the soldNum data
pattern = re.compile(r'\"soldNum\":\d+')

# Iterate through script tags and search for the pattern
for script in soup.find_all('script'):
    if script.string:  # Only proceed if the script tag contains text
        matches = pattern.findall(script.string)
        if matches:
            # Assuming you've found matches, you can then parse those
            for match in matches:
                # Extracting the numerical value
                soldNum = json.loads('{' + match + '}')
                print(soldNum['soldNum'])

注意:这是一个简化的示例,可能需要根据实际脚本内容进行调整。

© www.soinside.com 2019 - 2024. All rights reserved.