我试图抓取一个网站说Stackoverflow。我写了一段代码,可以抓取文本以及图像和URL。我想将此数据保存到sqllite数据库。
我已经建立了与数据库的连接。但是,在将内容保存到数据库时出现错误。
这是我的代码scraper.py
from bs4 import BeautifulSoup, SoupStrainer
import requests
from urllib.request import urlopen
import re
import sqlite3
url = "http://stackoverflow.com/"
page = requests.get(url)
data = page.text
soup = BeautifulSoup(data, features='html.parser')
soup.prettify()
text_data = soup.find_all('p')
print(text_data) #This will return all Text data.
for link in soup.find_all('a'):
print(link.get('href')) #This will return all urls
html = urlopen(url)
bs = BeautifulSoup(html, features='html.parser')
images = bs.find_all('img', {'src':re.compile('.jpg')})
for image in images:
print(image['src']+'\n') #This will return all Image urls
conn = sqlite3.connect('scraped.sqlite3',check_same_thread=False)
curs = conn.cursor()
#curs.execute("INSERT INTO scraped(data,link,img_url) values('text_data','link.get('href')','image['src']")
conn.commit()
将此行输入程序后
curs.execute("INSERT INTO scraped(data,link,img_url) values('text_data','link.get('href')','image['src']")
``` it throws an error like sqlite3.OperationalError: near "href": syntax error
I tried finding it but didnt understand it. Sorry if it is something very trivial
curs.execute("INSERT INTO scraped(data, link, img_url) values('{text_data}','{href}','{image}'".format(text_data=text_data, href=link.get('href'), image=image['src']))
ref:https://docs.python.org/3/library/string.html#formatstrings