嗨,我写了以下代码来提取属性详细信息。
此刻我正在尝试提取区域。
import requests
from bs4 import BeautifulSoup
#Loads the webpage
r = requests.get("https://www.century21.com/for-sale-homes/Westport-CT-20647c", headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
#grabs the contect of this page
c=r.content
if "blocked" in r.text:
print ("we've been blocked")
#makes the content more readable
soup=BeautifulSoup(c,"html.parser")
#Finds the number of proterty Listed
all=soup.find_all("div", {"class":"sr-card js-safe-link"})
x=all[0]
for li in x.find_all("li"):
print(li)
以上代码将打印出以下内容
<li class="test-beds">6 beds</li>
<li class="test-baths">9 baths</li>
<li>8,511 sq ft</li>
<li>$370 / sq ft</li>
<li>On Site 2 days</li>
<li>Single Family Residence</li>
我的问题是如何提取“ 8,511平方英尺”的数据]
我尝试了print(li[2])
,但不幸的是它没有用。
有人可以指出我在哪里犯错,并指出正确的方向来纠正它。
谢谢
您需要使用.text
来获取不带标签的内容。我还让它同时打印了li[2]
和li[2].text
以显示差异
import requests
from bs4 import BeautifulSoup
#Loads the webpage
r = requests.get("https://www.century21.com/for-sale-homes/Westport-CT-20647c", headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
#grabs the contect of this page
c=r.content
if "blocked" in r.text:
print ("we've been blocked")
#makes the content more readable
soup=BeautifulSoup(c,"html.parser")
#Finds the number of proterty Listed
all=soup.find_all("div", {"class":"sr-card js-safe-link"})
x=all[0]
# Store all elements with tag <li> in li
li = x.find_all("li")
# Print the element in index position 2
print (li[2])
print (li[2].text)
只需使用css选择器找到它
data = r.text
soup = BeautifulSoup(data)
number_li = soup.select( '.sr-card .js-safe-link ul li:nth-child(3)')