我想知道 scrapy 管道是否真的不支持上下文管理器协议,或者我没有正确设置它。
我正在尝试运行 scrapy 管道并将数据保存到 Postgres。
这是我的连接池:
from psycopg2.pool import SimpleConnectionPool
from contextlib import contextmanager
from decouple import config
dbpool = SimpleConnectionPool(1, 20,
host=config('DB_HOST'),
port=config('DB_PORT'),
dbname=config('DB_NAME'),
user=config('DB_USER'),
password=config('DB_PASSWORD'),
)
@contextmanager
def db_cursor():
conn = dbpool.getconn()
try:
with conn.cursor() as cur:
yield cur
conn.commit()
except:
conn.rollback()
raise
finally:
dbpool.putconn(conn)
然后我导入一个游标以便在管道中使用它。
from my_app.queries.pools import db_cursor
from my_app.scripts.movies.db_operations import create_table
class ImdbPipeline:
def __init__(self):
create_table('movies')
def process_item(self, item, spider):
with db_cursor as cur:
cur.execute(""" INSERT INTO movies (title, year, duration, genre, rating, movie_url) VALUES (%s,%s,%s,%s,%s,%s);""", (
item["title"],
item["year"],
item["duration"],
item["genre"],
item["rating"],
item["movie_url"]
))
return item
def close_spider(self, spider):
pass
据了解,当我在不同的文件中使用上下文管理器时,它工作正常。请注意,
create_table
函数是从 pipeline.py
中调用的。唯一的区别是在__init__()
期间调用而不是在process_item()
期间调用
from bapi_django.queries.pools import db_cursor
def create_table(table_name):
with db_cursor() as cur:
cur.execute('''CREATE TABLE %s (
id SERIAL PRIMARY KEY,
title VARCHAR(50) NOT NULL,
year INT,
duration VARCHAR(50),
genre VARCHAR(50),
rating DECIMAL(10,1),
movie_url VARCHAR(250));
''' % table_name)