Pandas DataFrame-如何提取带有隐藏字符的字符串模式

问题描述 投票:0回答:1

我正在从this website抓取名称,价格和图像。总共有8个项目,但是在DF中,我只想过滤包含模式“ Original Zaino Antifurto”的项目。当我尝试将bp_filter应用于DF时,出现错误,可能是由于隐藏字符所致。

有人知道如何针对该模式进行过滤以避免错误吗?

import requests
from bs4 import BeautifulSoup
import pandas as pd

url_xd = 'https://www.xd-design.com/it-it/catalogsearch/result/?q=Bobby+Original+Zaino+Antifurto'
req_xd = requests.get(url_xd)
pars_xd = BeautifulSoup(req_xd.content, 'html.parser')
con_xd = pars_xd.find_all('div', class_ = 'product details product-item-details')

names_xd = []
prices_xd = []
picts_xd = []

for container in con_xd:
        name = container.find("a", class_="product-item-link").text
        names_xd.append(name)

for container in con_xd:
        price = container.find("span", class_="price").text
        prices_xd.append(price)

for container in con_xd:
        pict = container.find("a").get("href") 
        picts_xd.append(pict) 

bp_xd = pd.DataFrame({'(XD-Design) Item_Name': names_xd,
                            'Item_Price_EUR': prices_xd,
                            'Link_to_Pict': picts_xd })

bp_xd['Item_Price_EUR'] = bp_xd['Item_Price_EUR'].str.replace('€','').str.replace(',','.').astype(float)
bp_xd['(XD-Design) Item_Name'] = bp_xd['(XD-Design) Item_Name'].str.strip()

bp_filter = bp_xd['(XD-Design) Item_Name'][bp_xd['(XD-Design) Item_Name'].str.contains('Original Zaino Antifurto')]

# bp_xd[bp_filter]
pandas dataframe web-scraping character-encoding hidden
1个回答
0
投票

这里有固定的工作代码

import requests
from bs4 import BeautifulSoup
import pandas as pd

url_xd = 'https://www.xd-design.com/it-it/catalogsearch/result/?q=Bobby+Original+Zaino+Antifurto'
req_xd = requests.get(url_xd)
pars_xd = BeautifulSoup(req_xd.content, 'html.parser')
con_xd = pars_xd.find_all('div', class_ = 'product details product-item-details')

names_xd = [c.find("a", class_="product-item-link").text for c in con_xd]
prices_xd = [c.find("span", class_="price").text for c in con_xd]
picts_xd = [c.find("a").get("href") for c in con_xd]


df = pd.DataFrame({'(XD-Design) Item_Name': names_xd,
                            'Item_Price_EUR': prices_xd,
                            'Link_to_Pict': picts_xd })

df['Item_Price_EUR'] = df['Item_Price_EUR'].str.replace('€','').str.replace(',','.').astype(float)
df['(XD-Design) Item_Name'] = df['(XD-Design) Item_Name'].str.strip()
df = df.loc[df['(XD-Design) Item_Name'].apply(lambda x: 1 if 'Original Zaino Antifurto' in x else 0) == 1]
© www.soinside.com 2019 - 2024. All rights reserved.