无法在Python中获取属性值

问题描述 投票:1回答:1

我正在尝试为网站编写一个刮刀,到目前为止,我能够抓取我需要的一般信息,但我试图从该信息中获取的特定属性值返回时没有,即使有明显的值。一切正常,直到我尝试在容器中使用每个容器的getattr来查找data-id的值。也许有更好的方法来做到这一点,但我很难理解为什么它无法找到它。

这就是我的代码。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as soup
from selenium.webdriver.common.action_chains import ActionChains

url = "http://csgo.exchange/id/76561197999004010#x"

driver = webdriver.Firefox()

driver.get(url)
import time
time.sleep(10)
html = driver.page_source
soup = soup(html, "html.parser")


containers = soup.findAll("div",{"class":"vItem"})
print(len(containers))

for container in containers:
    test = getattr(container, "data-id")

    print(str(test))


with open('scraped.txt', 'w', encoding="utf-8") as file:
    file.write(str(containers))

这是每个容器的样子。

div class =“vItem Normal Container cItem”data-bestquality =“0”data-category =“Normal”data-collection =“Spectrum Collection”data-currency =“0”data-custom =“”data-exterior =“ “data-hashname =”Spectrum%20Case“data-id =”15631554103“

python findall getattr
1个回答
0
投票

只需将getattr()更改为container.attrs["data-id"]。这对我行得通。但在大多数尝试中,10秒的时间睡眠对我来说还不够。

from bs4 import BeautifulSoup as soup
from selenium.webdriver.common.action_chains import ActionChains

url = "http://csgo.exchange/id/76561197999004010#x"

driver = webdriver.Firefox()

driver.get(url)
import time
time.sleep(10)
html = driver.page_source
soup = soup(html, "html.parser")


containers = soup.findAll("div",{"class":"vItem"})
print(len(containers))
data_ids = [] # Make a list to hold the data-id's

for container in containers:
    test = container.attrs["data-id"]
    data_ids.append(test) # add data-id to the list

    print(str(test))


with open('scraped.txt', 'w', encoding="utf-8") as file:
    for id in data_ids:
        file.write(str(id)+'\n') # write every data-id to a new line. 
© www.soinside.com 2019 - 2024. All rights reserved.