我一直在开发一个 Flask 网站应用程序以供学习/练习。它是一个显示 GPU 型号列表及其价格的网站,所有这些都是从 URL 列表中抓取的。用户登录并输入 GPU 表单的所需价格,然后单击订阅,checksubs 脚本每天运行一次以比较所需价格与价格,然后发送电子邮件进行购买。它是一个带有 sqlalchemy 模型/表的 postgres 数据库,其中包括:gpu、用户、价格、订阅。我让应用程序达到了我满意的工作状态,然后继续尝试将其格式化为 docker-compose 部署,其中 Flask-app、webscrape、postgres 和 emailsubs 都设置为 docker-compose 中定义的服务。 yml.
当我将 docker 卷添加到 docker-compose.yml 中时,我的问题开始了。我希望在 docker-compose 上下之间保持 postgres 数据库的数据持久性。现在,每次我进行 docker-compose up 时,它都会为每个订阅多发送 +1 封电子邮件。例如,假设数据库中有一个子项目与desired_price>=价格匹配,如果这是我第二次运行docker-compose up,它将为该子项目发送两倍的电子邮件。如果我先 docker-compose down 然后 up ,它就会为 sub 发送 3 倍的电子邮件。还想指出,我已经从代码中删除了所有多线程,因为这是我想到尝试修复的第一件事。
如果没有 docker 卷,则不会出现此问题。另外,如果我在第三个 docker-compose up 上添加一个新子程序,它也会为新子程序发送 3 封电子邮件。当我在本地 postgres 数据库上运行代码时,这也不是问题。起初,我认为我的 Flask 应用程序正在每次“flask 运行”时初始化一个“新”数据库,因此我添加了逻辑来检查 db.create_all() 之前数据库中是否已存在表:
max_retries = 10 # You can adjust this number as needed
retries = 0
while retries < max_retries:
try:
with app.app_context():
# Directly query the information schema to check for table existence
query = text("SELECT EXISTS (SELECT FROM information_schema.tables WHERE table_name = 'gpu')")
result = db.session.execute(query)
table_exists = result.scalar()
if not table_exists:
print("Database tables not found. Initializing...")
db.create_all()
else:
print("Database tables already exist.")
break # Break out of the loop if database setup succeeds
except OperationalError:
print(f"Database connection failed. Retrying in 5 seconds ({retries}/{max_retries})")
time.sleep(5)
retries += 1
else:
print("Failed to establish database connection after multiple retries. Exiting.")
return None # Return None to indicate failure
...但运气不佳,尽管它确实识别了第一次新烧瓶运行后存在的表。
这是 docker-compose.yml(包含 ofc 的修订版):“
version: '3'
services:
flask-app:
build: .
ports:
- "5000:5000"
depends_on:
- postgres
environment:
- DATABASE_URI=postgresql://...:...@postgres/...
healthcheck:
test: ["CMD-SHELL", "pg_isready -h postgres -U dbuser"]
interval: 10s
timeout: 5s
retries: 5
postgres:
image: postgres:13
environment:
POSTGRES_USER: ...
POSTGRES_PASSWORD: ...
POSTGRES_DB: ...
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
webscraper:
build:
context: .
dockerfile: Dockerfile.webscraper # Dockerfile for the webscraper service
depends_on:
- postgres
- flask-app
healthcheck:
test: ["CMD-SHELL", "pg_isready -h postgres -U dbuser"]
interval: 10s
timeout: 5s
retries: 5
checksubs:
build:
context: .
dockerfile: Dockerfile.checksubs # Dockerfile for the checksubs service
depends_on:
- postgres
- flask-app
- webscraper
healthcheck:
test: ["CMD-SHELL", "pg_isready -h postgres -U dbuser"]
interval: 10s
timeout: 5s
retries: 5
volumes:
postgres-data:
driver: local
” 但我很确定这是正确的,只有 postgres 服务才能获得卷。最后,我认为问题可能出在我的电子邮件发送代码上。但它与本地主机 postgres 一起工作得很好:
def main():
# Load database URI from config.json
config_path = 'config.json' # Adjust the path to reach the config.json file
env = os.environ.get('FLASK_ENV', 'development')
with open(config_path, 'r') as config_file:
config = json.load(config_file)
db_uri = config[env]['DATABASE_URI']
# Database connection setup
engine = create_engine(db_uri)
Session = sessionmaker(bind=engine)
session = Session()
# Establish SMTP connection
smt = smtplib.SMTP('smtp.gmail.com', 587)
smt.ehlo()
smt.starttls()
email = os.environ.get('SENDER_EMAIL')
password = os.environ.get('SENDER_PASSWORD')
smt.login(email, password)
# Get subscription data using SQLAlchemy query
subscriptions = (
session.query(
Subscriptions.user_id,
Subscriptions.gpu_id,
Price.price,
Users.email,
GPU.model,
GPU.url
)
.join(GPU, Subscriptions.gpu_id == GPU.id)
.join(Price, GPU.id == Price.gpu_id)
.join(Users, Subscriptions.user_id == Users.id)
.filter(Subscriptions.desired_price >= Price.price)
.all()
)
for subscription in subscriptions:
user_id, gpu_id, price, user_email, gpu_model, gpu_url = subscription
# Create email
subject = "Price notification"
message = f"Hey, the price of {gpu_model} has dropped to ${price:.2f}. Buy it!\nLink: {gpu_url}"
msg = MIMEMultipart()
msg['From'] = '[email protected]'
msg['To'] = user_email
msg['Subject'] = Header(subject, 'utf-8')
msg.attach(MIMEText(message, 'plain', 'utf-8'))
# Print the email details for server log
print(f"Sending email to: {user_email}")
# Send email
smt.sendmail('[email protected]', user_email, msg.as_string())
# Clean up
smt.quit()
session.close()
if __name__ == "__main__":
main()
...也许/希望你能看到我没有看到的东西。谢谢。
想通了。问题是我上面的查询。因为价格表会为每个网页抓取添加/插入新行,所以每个网页抓取都会有一组新的价格行,并且由于我上面有缺陷的查询,它将为特定 gpu_id 的每种价格情况进行订阅。新的固定查询仅使用价格表中的最大日期值:
# Get subscription data using SQLAlchemy query
latest_price_subquery = (
session.query(Price.gpu_id, func.max(Price.date).label("latest_date"))
.group_by(Price.gpu_id)
.subquery()
)
subscriptions = (
session.query(
Subscriptions.user_id,
Subscriptions.gpu_id,
Price.price,
Users.email,
GPU.model,
GPU.url
)
.join(GPU, Subscriptions.gpu_id == GPU.id)
.join(Price, GPU.id == Price.gpu_id)
.join(Users, Subscriptions.user_id == Users.id)
.join(latest_price_subquery, Price.gpu_id == latest_price_subquery.c.gpu_id)
.filter(Subscriptions.desired_price >= Price.price)
.filter(Price.date == latest_price_subquery.c.latest_date)
.all()
)```