我正在尝试从脚本中抓取数据。首先我使用 soup.find_all 然后使用 js2py 进行转换,最后打印所需的数据。但没有成功。我想知道如何收集soldNum信息?
这是我的代码:
from bs4 import BeautifulSoup
import requests
import js2py
html_text = requests.get('https://www.daraz.com.bd/womens-shalwar-kameez/?spm=a2a0e.home.cate_1_1.2.735212f7tVtHu9&price=600-&from=filter').text
soup = BeautifulSoup(html_text, 'lxml')
# Find the div with the specific attribute
script_content = soup.find_all('script')[3]
for script in script_content:
f = js2py.eval_js(script)
print(f)
soldNum = f['soldNum']
print(soldNum)
我们可以使用正则表达式(re 模块)来搜索脚本内容中的 sellNum 数据
from bs4 import BeautifulSoup
import requests
import re
import json
# Fetch the page content
url = 'https://www.daraz.com.bd/womens-shalwar-kameez/?spm=a2a0e.home.cate_1_1.2.735212f7tVtHu9&price=600-&from=filter'
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'lxml')
# Use regular expressions to find patterns that might hold the soldNum data
pattern = re.compile(r'\"soldNum\":\d+')
# Iterate through script tags and search for the pattern
for script in soup.find_all('script'):
if script.string: # Only proceed if the script tag contains text
matches = pattern.findall(script.string)
if matches:
# Assuming you've found matches, you can then parse those
for match in matches:
# Extracting the numerical value
soldNum = json.loads('{' + match + '}')
print(soldNum['soldNum'])