在一行上抓取输出到MySql

问题描述 投票:0回答:1

我很难在数据库中的多行输出结果时感到scrap草。我知道当我导出到json文件时如何使其循环,但是由于我是个草率的菜鸟,所以我遵循了一些有关导出到数据库的语法的教程,但是它将返回的数据全部放在一行上。这是我的蜘蛛代码

import scrapy


from ..items import SandboxItem
class IndigoSpider(scrapy.Spider):
    name = 'Indigo'
    start_urls = ['https://www.chapters.indigo.ca/en-ca/books/?link-usage=Header%3A%20books&mc=Book&lu=Main']


def parse(self, response):
    items = SandboxItem()
    Product_Name= str(response.css('.product-list__product-title-link--grid::text').getall()),
    Product_Author= str(response.css('.product-list__contributor::text').getall()),
    Product_Price= str(response.css('.product-list__price--orange::text').getall()),
    Product_Image=  str(response.css('.product-image--lazy::attr(src)').getall())

    items['Product_Name'] = Product_Name
    items['Product_Author'] = Product_Author
    items['Product_Price'] = Product_Price
    items['Product_Image'] = Product_Image

    yield items

这是我的管道代码

import mysql.connector

class SandboxPipeline(object):


    def __init__(self):
        self.create_connection()
        self.create_table()
        # pass

    def create_connection(self):
        self.conn = mysql.connector.connect(
            host='localhost',  
            user='root',
            passwd='test123',
            database='python',
            auth_plugin='mysql_native_password'
        )
        self.curr = self.conn.cursor()

    def create_table(self):
        self.curr.execute(""" DROP TABLE IF EXISTS indigo""")
        self.curr.execute(""" Create table indigo(
        Product_Name text,
        Product_Author text,
        Product_Price text,
        Product_Image text
        )""")

    def process_item(self, item, spider):
        self.store_db(item)
        #     print("pipelinexds:" + str(item['Product_Name']))
        #     print(str(item['Product_Name']))
        return item

    #
    def store_db(self, item):
        self.curr.execute("""Insert Into indigo values (%s,%s,%s,%s)""",
                          ((item['Product_Name'][0]),
                           (item['Product_Author'][0]),
                           (item['Product_Price'][0]),
                           (item['Product_Image'][0]),
                           )
                          )
        self.conn.commit()

Database output looks like this

scrapy mysql-python scrapy-pipeline
1个回答
0
投票

看起来像您的item确实包含多个项目。因此,您需要在某处包含for循环以更改程序的体系结构。

我建议使用类似:

for i in len(Product_Name):
    yield  {
        'Product_Name': Product_Name[i],
        'Product_Author': Product_Author[i],
        'Product_Price': Product_Price[i],
        'Product_Image'; Product_Image[i]
    }

因此项目将包含单个值。应该对此进行正确处理。

© www.soinside.com 2019 - 2024. All rights reserved.