Python 上的自动解析

问题描述 投票:0回答:1

Python 编程新手。尝试编写一个自动解析产品卡的脚本。我设法应付了一页。如何让脚本自动跳转到另一个页面。我看到了几个硒有帮助的答案。但我无法弄清楚:( 这是代码:

import random
import string
import csv 
import requests
from bs4 import BeautifulSoup

url = "https://game29.ru/products?category=926"

response = requests.get(url)

html = response.text

multi_class = {'class': ['row'], 'style': 'border: 2px solid #898989;border-radius: 7px;padding: 2px;margin-top: -2px;'}

soup = BeautifulSoup(html, "html.parser")

products = soup.find_all("div", {"class":"row"})

identifaer = "".join([random.choice(string.ascii_letters + string.digits) for n in range(32)])
ad_status = "Free"
category = "Игры, приставки и программы"
goods_type = "Игры для приставок"
ad_type = "Продаю своё"
adress = ""
discription = ""
condition = "Новое"
data_begin = "2024-04-03"
data_end = "2024-05-03" 
allow_email = "Нет"
contact_phone = ""
contact_method = "По телефону и в сообщениях"

all_products = []

for product in products:
    if product.attrs == multi_class:
        identifaer
        image ="https://www.game29.ru" + product.find("img")["src"]
        if image != "https://game29.ru/zaglushka.png":
            title = product.find("div", {"class":"cart-item-name"}).text
            price = product.find("div", {"class": "cart-item-price"}).text.strip().replace("руб.", "")
            all_products.append([identifaer, ad_status, category, goods_type, ad_type, adress, title, discription, condition, price, data_begin, data_end, allow_email, contact_phone, image, contact_method])

# names = ["Id", "AdStatus", "Category", "GoodsType", "Adtype", "Adress", "Title", "Discription", "Condition", "Price", "DataBegin", "DataEnd", "AllowEmail", "ContactPhone","ImageUrls", "ContactMethod"]

with open("data.csv", "a", newline='') as csv.file:
    writer = csv.writer(csv.file, delimiter=',')
    # writer.writerow(names)
    
    for product in all_products:
        writer.writerow(product)

我真的认为硒会对我有帮助。我认为答案就在那里,但不幸的是,我还不明白,但我没有太多时间。亲爱的大师,如果您能帮助我,我会很高兴。

python database parsing auto
1个回答
0
投票

如果您查看该网站,您会发现单击任何页码都会修改 URL 以包含

page=
属性。例如,第 2 页可通过地址 https://game29.ru/products?page=2&category=926 访问。因此,您应该创建一个处理每个页面的函数,然后从递增页码的循环中调用该函数。比如:

def parser(url):
    # add the beautiful soup and parsing code here
    # return True or False to indicat that the page was processed

# The main loop is something like
page_number = 1
while True:
    url = F'https://game29.ru/products?page={page_number}&category=926'
    if parser(url) == False:
        break # stop processing
© www.soinside.com 2019 - 2024. All rights reserved.