用BeautifulSoup刮擦Craiglist,并在每个帖子中获取第一张图片

问题描述 投票:0回答:1

我目前正在尝试从craigslist抓取航空数据。除了每个帖子的第一张图片外,我都可以获取我想要的所有信息。这是我的链接:

https://spokane.craigslist.org/search/avo?hasPic=1

由于此站点上的其他帖子,我能够获得所有图像,但是我很难弄清楚如何仅获得第一张图像。

我正在使用bs4并请求此脚本。这是到目前为止我得到的每张图像:

from bs4 import BeautifulSoup as bs
import requests

image_url = 'https://images.craigslist.org/{}_300x300.jpg'
r = requests.get('https://spokane.craigslist.org/search/avo?hasPic=1')
soup = bs(r.content, 'lxml')
ids = [item['data-ids'].replace('1:','') for item in soup.select('.result-image[data-ids]', limit = 10)] 
images = [image_url.format(j) for i in ids for j in i.split(',')]
print(images)

非常感谢您的帮助。

谢谢,

inzel

python image beautifulsoup screen-scraping craigslist
1个回答
0
投票
from bs4 import BeautifulSoup
import requests

r = requests.get("https://spokane.craigslist.org/search/avo?hasPic=1")
soup = BeautifulSoup(r.text, 'html.parser')

img = "https://images.craigslist.org/"


imgs = [f"{img}{item.get('data-ids').split(':')[1].split(',')[0]}_300x300.jpg"
        for item in soup.findAll("a", class_="result-image gallery")]

print(imgs)

输出:

['https://images.craigslist.org/00N0N_ci3cbcv5T58_300x300.jpg', 'https://images.craigslist.org/00101_5dLpBXXdDWJ_300x300.jpg', 'https://images.craigslist.org/00n0n_8zVXHONPkTH_300x300.jpg', 'https://images.craigslist.org/00l0l_jiNMe38avtl_300x300.jpg', 'https://images.craigslist.org/00q0q_l4hts9RPOuk_300x300.jpg', 'https://images.craigslist.org/00D0D_ibbWWn7uFCu_300x300.jpg', 'https://images.craigslist.org/00z0z_2ylVbmdVnPr_300x300.jpg', 'https://images.craigslist.org/00Q0Q_ha0o2IJwj4Q_300x300.jpg', 'https://images.craigslist.org/01212_5LoZU43xA7r_300x300.jpg', 'https://images.craigslist.org/00U0U_7CMAu8vAhDi_300x300.jpg', 'https://images.craigslist.org/00m0m_8c7azYhDR1Z_300x300.jpg', 'https://images.craigslist.org/00E0E_7k7cPL7zNnP_300x300.jpg', 'https://images.craigslist.org/00I0I_97AZy8UMt5V_300x300.jpg', 'https://images.craigslist.org/00G0G_iWw8AI8N8Kf_300x300.jpg', 'https://images.craigslist.org/00m0m_9BEEcvD0681_300x300.jpg', 'https://images.craigslist.org/01717_4Ut5FSIdoi3_300x300.jpg', 'https://images.craigslist.org/00h0h_jeAhtDXW2ST_300x300.jpg', 'https://images.craigslist.org/00T0T_hTogH4m9zTH_300x300.jpg', 'https://images.craigslist.org/01212_9x1EFI1CYHE_300x300.jpg', 'https://images.craigslist.org/00H0H_kiXLOtVgReA_300x300.jpg', 'https://images.craigslist.org/00P0P_ad77Eqvf1ul_300x300.jpg', 'https://images.craigslist.org/00909_jyBoTCNGmAJ_300x300.jpg', 'https://images.craigslist.org/00g0g_gFtJlANhi51_300x300.jpg', 'https://images.craigslist.org/00202_3LV7YERBssE_300x300.jpg', 'https://images.craigslist.org/00j0j_3zxT682nE2i_300x300.jpg', 'https://images.craigslist.org/00Y0Y_b6AXcApcSfl_300x300.jpg', 'https://images.craigslist.org/00M0M_6eTHo5E3Ee5_300x300.jpg', 'https://images.craigslist.org/00g0g_hvyvJKUejXY_300x300.jpg', 'https://images.craigslist.org/00I0I_d2WOWXtgQ8s_300x300.jpg', 'https://images.craigslist.org/00s0s_dAwJG0D6uce_300x300.jpg', 'https://images.craigslist.org/00g0g_TC2qvnD3AN_300x300.jpg', 'https://images.craigslist.org/00M0M_Dba39RfEkr_300x300.jpg', 'https://images.craigslist.org/00M0M_31drxF6c9vO_300x300.jpg', 'https://images.craigslist.org/00505_jOjMq3B8y0M_300x300.jpg', 'https://images.craigslist.org/00e0e_ixfV647qwLh_300x300.jpg', 'https://images.craigslist.org/00p0p_i2noTC4cADw_300x300.jpg', 'https://images.craigslist.org/00a0a_kywatxfm6Ud_300x300.jpg', 'https://images.craigslist.org/00808_1ZjIIX8PdaP_300x300.jpg', 'https://images.craigslist.org/01515_blEEDKbbyKD_300x300.jpg', 'https://images.craigslist.org/00b0b_brUn6sUxBzF_300x300.jpg', 'https://images.craigslist.org/00U0U_2ukBvcgvU99_300x300.jpg', 'https://images.craigslist.org/01212_dPTe5ZHM26A_300x300.jpg', 'https://images.craigslist.org/00B0B_1GsE81zVsr0_300x300.jpg', 'https://images.craigslist.org/00N0N_l8SXlBaI8lq_300x300.jpg', 'https://images.craigslist.org/00f0f_82qAzPq7cXd_300x300.jpg', 'https://images.craigslist.org/00w0w_lUrgFG9YOY0_300x300.jpg', 'https://images.craigslist.org/00C0C_kiZpgrFEnO8_300x300.jpg', 'https://images.craigslist.org/00T0T_g7IHvHMx14L_300x300.jpg', 'https://images.craigslist.org/00E0E_bzm9jRXpWVd_300x300.jpg', 'https://images.craigslist.org/00k0k_lOCRF1fgWCF_300x300.jpg', 'https://images.craigslist.org/00y0y_exwReppAi3L_300x300.jpg', 'https://images.craigslist.org/01515_7xyZ605hYcc_300x300.jpg', 'https://images.craigslist.org/00J0J_hqLMLvTCfXk_300x300.jpg', 'https://images.craigslist.org/00505_3P0xQrbeFY4_300x300.jpg', 'https://images.craigslist.org/00r0r_gj6dO6ZHO8L_300x300.jpg', 'https://images.craigslist.org/01717_cIVmzgKCWtP_300x300.jpg', 'https://images.craigslist.org/00w0w_6O59k6qlZQz_300x300.jpg', 'https://images.craigslist.org/00808_jd43ZthN1uB_300x300.jpg', 'https://images.craigslist.org/00m0m_1GJ41cKvv4Y_300x300.jpg']

该列表包含每个帖子的第一张图片。

© www.soinside.com 2019 - 2024. All rights reserved.