将抓取的内容保存到Sqllite3数据库中-怎么做?

问题描述 投票:0回答:1

我试图抓取一个网站说Stackoverflow。我写了一段代码,可以抓取文本以及图像和URL。我想将此数据保存到sqllite数据库。

我已经建立了与数据库的连接。但是,在将内容保存到数据库时出现错误。

这是我的代码scraper.py

from bs4 import BeautifulSoup, SoupStrainer
import requests
from urllib.request import urlopen
import re
import sqlite3
url = "http://stackoverflow.com/"

page = requests.get(url)
data = page.text
soup = BeautifulSoup(data, features='html.parser')
soup.prettify()
text_data = soup.find_all('p')
print(text_data) #This will return all Text data.
for link in soup.find_all('a'):
    print(link.get('href')) #This will return all urls

html = urlopen(url)
bs = BeautifulSoup(html, features='html.parser')
images = bs.find_all('img', {'src':re.compile('.jpg')})
for image in images:
    print(image['src']+'\n') #This will return all Image urls

conn  = sqlite3.connect('scraped.sqlite3',check_same_thread=False)
curs = conn.cursor()
#curs.execute("INSERT INTO scraped(data,link,img_url) values('text_data','link.get('href')','image['src']")
conn.commit()

将此行输入程序后

curs.execute("INSERT INTO scraped(data,link,img_url) values('text_data','link.get('href')','image['src']")
``` it throws an error like sqlite3.OperationalError: near "href": syntax error
I tried finding it but didnt understand it. Sorry if it is something very trivial
python sqlite web-scraping beautifulsoup
1个回答
0
投票
curs.execute("INSERT INTO scraped(data, link, img_url) values('{text_data}','{href}','{image}'".format(text_data=text_data, href=link.get('href'), image=image['src']))

ref:https://docs.python.org/3/library/string.html#formatstrings

© www.soinside.com 2019 - 2024. All rights reserved.